Your Genetic Genealogist: GEDmatch

Showing posts with label GEDmatch. Show all posts

Wednesday, December 3, 2014

The Folly of Using Small Segments as Proof in Genealogical Research

Responsible genealogists adhere to high standards of proof in their research, in the evidence that they present and in the conclusions they reach. I strongly believe that genetic genealogists should as well. When we make claims that are not supported by sound science, then we undermine the credibility of our field.

Experience has demonstrated to me that there is great folly in claiming small segments can be used as proof (yes, even supporting) in genealogical research. When I use the term "small segments" in this article, I am referring to unphased "matching" segments under 5 centiMorgans and I am addressing their use in matching, not admixture. A few genetic genealogists have argued that there are certain instances when small segments are not only helpful in our genealogical research, but reliable. I strongly disagree.

One of the many problems with utilizing small segments is that, in general, people tend to see evidence that supports their theories and reject evidence that does not. Because the nature of small segments is so random, as I will demonstrate, it is possible that an individual will see patterns where none exist in reality, such as in a cluster of tiny, meaningless "matching" segments. This also holds true for admixture analysis.

Blaine Bettinger already wrote a great blog post explaining the work that has already been done on this issue along with some of his own comparisons, so I am going to concentrate on the multi-generational data to which I have access. Angie Bush has kindly allowed me access to her family's extensive data while she is unable to collaborate on this post since she is on a genealogy cruise. (Thanks, Angie!)

All of these examples are the first ones I looked at, so they are randomly chosen and not selected with bias. There is a huge amount of analysis that can still be performed on this data set. Since Gedmatch was down when I wrote this, I concentrated on Family Tree DNA data. When I am able to access Gedmatch again, I will add to my analysis.

First let's look at this simple chart of my data compared to James, a confirmed paternal fourth cousin, and then my father's data compared to that same cousin. As you can see, both my father and I have one substantial matching segment with James on Chromosome 4 (in purple). Some would argue that because we have one longer matching segment, that this makes the matching small segments reported more valid and thus can be more responsibly attributed to our known common ancestor.

Notice the segments highlighted in red in my chart. Those are all segments that were reported to be matching between me and James that do not show up as matches with my father. So, right off the bat, we can eliminate eight segments of what some might claim is supporting evidence of the known relationship with James. That is 66.6% of the segments under 5 cM, which is in line with what was found in the 23andMe study.

Since I have no reason to believe that I inherited those segments from my mother, they are likely pseudo-segments. Pseudo-segments are spliced together by jumping between alleles from mom and dad, impersonating a matching stretch of DNA where one does not exist. The inability to distinguish these from authentic matching segments is a limitation of our current technology. Could they have actually come from my mother, you might be asking? My mother does not match James at the Family Tree DNA thresholds and I can't check Gedmatch to be sure, but there are no known common origins between them. (I am checking with James to see if he is willing to allow me to make that comparison for my next post.) Regardless, this analysis clearly disproves that the red segments are a result of the known paternal relationship. As such, there should be no argument to the conclusion that the majority of the small segments in this randomly chosen example cannot function as supporting evidence of the primary relationship in any way.

Next, look at the green segments. In this case, it appears that I inherited those from my father, but if you look closely, they are actually longer for me than for my father. This means that they are at least, partially, false positives or pseudo-segments. Incidentally, the one substantial matching segment we have in common (purple) is also reported to be a bit longer for me than for my father, which illustrates that it is questionable to rely too heavily on what appear to be exact assignments. In my list of matching segments, only the pink segments on chromosomes 2 and 3 are left as potentially fully IBD segments. Some will say that the fact that they persist from parent to child makes them more reliable indicators of a genealogical relationship. Perhaps, but there is no proof that that the pink segments weren't originally pseudo-segments interpreted as a match by the technology in my father's data and then passed to me through recombination of his two chromosomes. Does that sound far-fetched? Well let's see by looking at multi-generational data.

Please bear with me because this is going to take awhile. This chart is the matching DNA between Brynne and a known Bush cousin from her mother's father's father's branch of the family. The common ancestors are Frederick Bush and Martha White, so you can see that the expected path of inheritance for matching DNA between Brynne and this cousin is:
Brynne >> Angie >> Grandpa >> Great Grandpa

Here we are looking at the threshold set at 5 cM. Brynne's data compared to the Bush cousin is on the left and the comparison of her mother Angie to this same cousin is on the right.

This is her grandfather's (left) and great grandfather's (right) DNA compared to the same cousin.

These are nicely consistent with all of Brynne's matching segments being inherited from her great grandfather, as would be expected.

Now, let's look at the same comparisons with the threshold lowered to 1 centiMorgan.

Brynne and Angie:

Grandfather and great grandfather:

As you can see things got very messy at this level. We have all kinds of problems and inconsistencies with the data now. Let's look at just a few.

Chromosome 11:

As you can see Brynne has three small segments (under 5 cM) in common with her known Bush cousin on Chromosome 11. One is lost as we move to her mother Angie's comparison, but two persist. So, if the theory is correct that when a small segment persists over two generations that it is more likely to be identical by descent or attributable to the known common ancestor, then the two remaining ones should be IBD. However, look what happens - another is lost when we move the next generation back in time toward the common ancestor with the known cousin and then finally all three have disappeared by the time we get to the great grandfather. This is the opposite of what we should be seeing. Could these last two segments be attributable to another common ancestor on Brynne's grandmother's and great grandmother's branches of her tree? Possibly, but if so, that still doesn't support the claim that small segments help to prove the primary relationship responsible for the large matching segments. In fact, it refutes it because it demonstrates that even in families with no known pedigree collapse, such as this one, there still may be small segments inherited from distant common ancestors.

We saw other problems too. In some cases, like on Chromosomes 3 and 6, segments disappear at one generation and seemingly reappear at the next. That tells us one of two things - that coincidences happen and/or that the technology is not reliably picking up these small segments consistently. Either scenario does not instill confidence in genealogical conclusions based on small segment analysis.

Chromosome 3: Grandpa was "skipped" and the segment was almost three times larger in the most recent generation which is opposite of what we would expect to see if it was identical by descent.

Chromosome 6: Mom was "skipped". Notice the high number of SNPs (again many more in the most recent generation), which makes it seem less likely that it was simply missed by the technology.

These examples lend credence to the myth that DNA can skip a generation, which we all know to be untrue.

Most importantly, in this entire comparison, NOT ONE of Brynne's small segments shared with her known Bush cousin persisted consistently through all four generations on the path back to the known common ancestor.

When going through this data, I saw so many examples that fly in the face of the belief that small segments can, in any way, be reliable indicators of a genealogical relationship that I couldn't even begin to cover them all here. Since Gedmatch was down while I was writing this, I was unable to do some of the comparisons I had planned, so perhaps I will do that at a later time.

In the meantime, since I read a lot of comments over the last few days that people feel comfortable mapping small segments to their known ancestors using comparisons of their close relatives, I decided to see if that, at least, could stand up to analysis. Let's look at Brynne compared to her maternal grandparents.

We can see her DNA mapped to her grandfather in orange and her grandmother in blue. It is quite clean at the 5 cM threshold on the left with almost no overlap as we would expect, however when you drop the threshold to 1 cM, you can start to see issues on the right. Look at Chromosome 1, for example. There are three small segments from the grandparents that are directly in opposition to the obvious inheritance pattern. You can also see it on chromosomes 3, 5, 6, 10, 12 and 14 (click image to enlarge). If you only had one of the grandparents tested, you would map those small segments to the wrong grandparent and, thus, be "barking up the wrong" branch of the tree.

Brynne's DNA mapped to her maternal grandparents

Let's look more closely at Brynne's Chromosome 14 and the inheritance from her maternal grandparents through to her great grandfather Bush.

The pink in the image below is the comparison with her mother, Angie. Of course, they share across the length of the chromosome. Then, you can see, in green, the DNA she shares with her maternal grandfather and, in blue, the DNA she shares with her great grandfather from the same line. It appears that she has one long segment from her grandfather and then one small one that she inherited from her great grandfather through her grandfather. You would feel pretty safe mapping that small blue/green segment to her great grandfather, right? There is only one problem...the orange is the DNA she inherited from her maternal grandmother! That small segment falls right where the DNA she inherited from her mother came from her maternal grandmother, not her grandfather! She couldn't have inherited DNA from both her maternal grandmother and her maternal grandfather on that spot, so the small segment must be a false positive even though it persisted over multiple generations.

You can see similar problems on Chromosome 1, 5 and 6.

Remember she can't inherit DNA in the same spot from both grandma and grandpa.

Pink - mother, green = grandfather, blue = great grandfather (father of grandfather), orange = grandmother.

All three of these chromosomes show small segments that fall in sections inherited from the opposite side of the family, proving they are false positives. Look at the colorful pile-up on Chromosome 6. Some of these segments are almost 5 cM!

There is so much more to say about the use of small segments in genealogical research and a huge amount of data to explore, but I will stop here for today. I think that these few examples should give any genetic genealogist who believes that small segments can, in any way, support genealogical theories serious pause for thought.

In a later article, we will examine the assertion that small segments can prove useful as "population specific" guides and if there is any support for the recent ancient genome comparison analysis. The fact that these segments are not consistently inherited certainly calls that type of analysis into question as well.

I encourage those of you with access to multi-generational data to perform a similar analysis and let us know what you find. The more data, the better!

[Note: In the future, I believe that we will be able to utilize smaller segments in our research and even assign them to specific ancestors through chromosome mapping, but this will only be possible when technology has advanced considerably and we are using higher resolution autosomal DNA testing and much improved phasing engines. The exception is Tim Janzen who is attempting to do so now through highly technical and advanced work. He is phasing his data through testing and comparison of large numbers of known relatives, many more than the vast majority of genealogists will ever test. To my knowledge, he has never claimed to have used small segments to break down any genealogical brick walls or to have proven anything in that regard, even as supporting evidence.]

Saturday, December 7, 2013

23andMe Releases a Sample of Their New V4 File: First Look and Analysis

23andMe has released access to a sample file of their new v4 chip. This is a 100% custom chip with hand-selected SNPs. In a departure from the other two companies offering autosomal DNA testing to the genealogy community, they are now using the Illumina iSelect chip as the foundation instead of the Illumina OmniExpress chip. On the new v4 chip, the total number of autosomal DNA and X DNA SNPs has decreased substantially, while the total number of mtDNA SNP and Y-SNPs has increased.

Since I heard about this new chip, I have been concerned about the impact it will have on the genetic genealogy community and, particularly, on the compatibility with 3rd party uploads and tools. If 23andMe customers are unable to take advantage of the extremely beneficial opportunity to upload their data into Family Tree DNA's Family Finder database and to make use of the wonderful tools at GEDmatch, that would be a huge loss for all of us. It would also be a shame if the usefulness of the Y-SNPs and mtDNA SNPs tested by 23andMe is reduced for our community. To try to determine if this will be the case, Dr. Tim Janzen has helped me to analyze the new file overall and Larry Vick has specifically analyzed the Y-SNPs.

The v4 chip currently has just over 602,000 total SNPs versus 967,000 total SNPs on the v3 chip. This chip is not as robust as the chip previously used by 23andMe or the chips used by Family Tree DNA and AncestryDNA for the autosomal and X-SNPs.This is of great concern, but 23andMe has stated that they plan to impute a large number of SNP allele values from our results (see comments at http://blog.23andme.com/news/23andmes-new-custom-chip/), so hopefully Family Tree DNA and citizen scientists will be able to do the same to extract the most utility and compatibility from this platform. I am continually surprised and amazed by the resourcefulness of our community, so I am hopefully optimistic.

This change in platforms was intended to help the company ramp up their processing capacity in conjunction with their massive marketing campaign to acquire one million customers and beyond. (They can run 24 samples on each v4 chip at once instead of 8 on each of the v3 chips.) Especially in light of 23andMe's recent decision to provide ancestry-related interpretation and raw data exclusively, the v4 chip does not appear to be a beneficial development. In fact, it may result in additional loss of sales. Currently, the genetic genealogy and the DNAAdoption communities generally recommend first testing at 23andMe and then transferring the raw data into FTDNA's Family Finder database in order to be in two databases at a reduced price. If it turns out that FTDNA is unable to continue to accept 23andMe transfers, these recommendations will likely change. Loss of compatibility with Gedmatch would also have a very detrimental effect on the utility for those using the data for genealogical and admixture research.

The following analysis is in depth and intended for those who want the specific details of the changes. Instructions to download the v4 file will follow the analysis.

First let's look at the Y-SNPs.

Larry Vick tells us:
I compared my Y-SNP file I downloaded on 14 Aug 2012 to the Y-SNP file I downloaded today to see if my current file has any changes. There weren't any. I then compared the (v4) file for Greg MENDEL that I downloaded today to a v3 file I downloaded for a friend (CRL) on 21 Mar 2013.

There are 2,329 Y-SNPs in the Greg MENDEL v4 file. CRL had 1,766 Y-SNPs in his v3 file. So the v4 file has 562 more SNPs than CRL's v3 file. Looking at the SNPs, there were 446 in CRL's v3 file that were not in the MENDEL v4 file. There were 1,009 in the MENDEL v4 file that weren't in the CRL v3 file. Of the 446 in the CRL v3 file that weren't in the MENDEL v4 file, 314 were no calls.

Of the 1,009 in the MENDEL v4 file that weren't in CRL's v3 file, 295 were no calls. When I compared the 1,009 in the MENDEL v4 file to my file I downloaded today, 494 were not in my file (although they could have been in past downloads but were removed prior to today's download). I have SNPs from v1, v2, and v3. Of the 515 that were in my file, 99 were no calls.

I compared the 494 that were not in my file to Adriano's file in the Y-Chromosome Comparison Project, and 268 were in his file. All but one of those 268 had i prefixes for the reference sequence number.

I then compared the 226 that weren't in Adriano's file to the ISOGG Y-SNP Compendium by position number (build 37), and 200 were in the ISOGG file. I created a list of those 200 with the first note field. The 26 that weren't in the ISOGG file included the one SNP with an rs number (rs5603911). The MENDEL file had 46 no calls for the 200 SNPs in the ISOGG file. All but one of the 226 that weren't in Adriano's file had i prefix reference sequence numbers.

Now let's look at the overall composition of the new v4 chip as compared to the other platforms.

Tim reviewed the v4 file and compared it to v2, v3, and FF files. The following are some general statistics on which he based his analysis.

atDNA SNPs
v2:
561,846 atDNA SNPs in 23andMe v2 data in a 2009 download.
515 i atDNA SNPs in 23andMe v2 data in a 2009 download.
556,787 atDNA SNPs in 23andMe v2 data in a fresh download.
758 i atDNA SNPs in 23andMe v2 data in a fresh download.

v3:
930,381 atDNA SNPs in 23andMe v3 data in a fresh download.
7,455 i atDNA SNPs in 23andMe v3 data in a fresh download.

v4:
577,382 atDNA SNPs in 23andMe v4 data.
41,855 i atDNA SNPs in 23andMe v4 data.

Family Finder:
708,092 atDNA SNPs in Family Finder data in general, but a fresh download only had 707,269 SNPs in it.

Y-SNPs
v2:
1880 Y SNPs in 23andMe v2 data in a fresh download. Of these 213 are i SNPs.

v3:
1766 Y SNPs in 23andMe v3 data in a fresh download. Of these 232 are i SNPs.

v4:
2329 Y SNPs in 23andMe v4 data. Of these 526 are i SNPs.

X-SNPs
v2:
13,876 X SNPs in 23andMe v2 data in a 2009 download. Of these 19 are i SNPs.
13,828 X SNPs in 23andMe v2 data in a fresh download. Of these 96 are i SNPs.

v3:
26,007 X SNPs in 23andMe v3 data in a fresh download. Of these 1006 are i SNPs.

v4:
19,487 X SNPs in 23andMe v4 data in a fresh download. Of these 4227 are i SNPs.

Family Finder:
18,022 X SNPs in Family Finder build 37 data in a fresh download.

mtDNA
v2:
2019 mtDNA SNPs in 23andMe v2 data in a fresh download. Of these 1572 are i SNPs.

v3:
2459 mtDNA SNPs in 23andMe v3 data in a fresh download. Of these 2016 are i SNPs.

v4:
3154 mtDNA SNPs in 23andMe v4 data. Of these 2681 are i SNPs.

Here are his comparisons between the various platforms.

atDNA:

453,854 atDNA SNPs in 23andMe v4 data are also found in 23andMe v2 data in a 2009 download. Of these SNPs, 419 are i SNPs.

453,357 atDNA SNPs in 23andMe v4 data that are also found in 23andMe v2 data in a fresh download. Of these SNPs, 546 are i SNPs.

509,630 atDNA SNPs in 23andMe v4 data that are also found in 23andMe v3 data in a fresh download. Of these SNPs, 6153 are i SNPs.

304,864 atDNA SNPs in 23andMe v4 data that have rs numbers are also found in Family Finder in a fresh download.

I then checked the 41,855 i atDNA SNPs in 23andMe v4 data and checked for matching positions in the Family Finder data. I found that there were 2556 i atDNA SNPs in 23andMe v4 data that had matching positions in the Family Finder data.

Assuming that all of those matching positions correspond with the same SNP in the Family Finder data, there are a maximum of 307,420 atDNA SNPs in 23andMe v4 data that are also found in Family Finder in a fresh download.

Y-SNPs:

979 Y DNA SNPs in 23andMe v4 data that are also found in 23andMe v2 data in a fresh download.

1320 Y DNA SNPs in 23andMe v4 data that are also found in 23andMe v3 data in a fresh download.

This means that there are 1009 Y SNPs found on the v4 chip that aren’t found on the v3 chip.

There are 563 more Y SNPs in v4 data than in v3 data.

X-SNPs:

11,070 X SNPs in 23andMe v4 data that are also found in 23andMe v2 data in a fresh download.

11,009 X SNPs in 23andMe v4 data that are also found in 23andMe v2 data in a 2009 download.

14,437 X SNPs in 23andMe v4 data that are also found in 23andMe v3 data in a fresh download.

7,513 X DNA SNPs in 23andMe v4 data that are also found in Family Finder in a fresh download.

mtDNA:

1698 mtDNA SNPs in 23andMe v4 data that are also found in 23andMe v2 data in a fresh download.

2208 mtDNA SNPs in 23andMe v4 data that are also found in 23andMe v3 data in a fresh download.

This means that there are 946 mtDNA SNPs found on the v4 chip that aren’t found on the v3 chip.

Tim's preliminary conclusions
The fact that there are only 307,420 atDNA SNPs in 23andMe v4 data that are also found in Family Finder is highly concerning. The specificity of matches when comparing v4 data to FF or AncestryDNA data will be significantly reduced in projects such as my Mennonite autosomal project. At DNA SNP coverage for the overlapping SNPs between v4 data compared to FF or AncestryDNA data will only be about 44 SNPs per cM. I don’t know if FTDNA and GEDmatch will be able to allow imports of 23andMe v4 data. The fact that there are about 130,000 more atDNA SNPs in a Family Finder dataset than in v4 and the fact that v4 data won’t be readily uploadable to GEDmatch is forcing me to rethink 23andMe as my primary testing lab for distant relatives.

Third Parties and Download
My sincere hope is that Family Tree DNA and GEDmatch are able to adjust their systems to work with this new data from 23andMe. As soon as I hear anything, I will be sure and report it.

If you are interested in analyzing the file yourself and you have a 23andMe account, these are the directions for downloading the sample v4 file from 23andMe:

Enable the Mendel family in your account here:
https://www.23andme.com/user/edit/examples/

Then select Greg Mendel's raw data file from the drop-down list here:
https://www.23andme.com/you/download/

This file will probably still change slightly as they complete their validation process, but it should be pretty close to what we will start to see for new customers at 23andMe in the coming weeks.

Please let me know what you think. I am especially interested to hear analysis from the citizen scientists and the creators of our community's third party features.

Monday, March 25, 2013

AncestryDNA, Raw Data and RootsTech

Tim Janzen and I discussing the AncestryDNA features at RootsTech with AncestryDNA staff

Since RootsTech there has been lots of discussion regarding the features that AncestryDNA is and is not planning to offer their customers. I will address the many questions that I have received about the meetings in which I participated at the show, but first let's review:

Raw Data Downloads
On Thursday, AncestryDNA fulfilled their promise to allow customers to download their raw data. As Dr. Ken Chahine had assured me back in November, the file is not encrypted and is compatible with third party tools.

I sent my file to a number of third party providers:

After working with it a bit, John Olson announced on the site that he expects that Gedmatch will be accepting AncestryDNA uploads in about two weeks.
David Pike told me that he has updated his tools to work with the AncestryDNA files.
Leon Kull has reportedly updated his HIR search site to work with them as well.
Dr. Ann Turner has created an Excel macro to convert the AncestryDNA files to 23andMe format.

At the "Ask the Expert" Genetic Genealogy panel that I moderated at RootsTech on Saturday:

Bennett Greenspan told the audience that Family Tree DNA will be accepting AncestryDNA transfers into Family Finder starting on May 1st.
Dr. Catherine Ball confirmed that the raw data file is not phased and that they are delivering it as they receive it from the chip manufacturer Illumina. She also confirmed what Dr. Ann Turner had already discovered - the data labeled as "Chromosome 25" is from the PAR region. Further, the "Chromosome 23" label refers to the X chromosome data and "Chromosome 24" refers to the Y chromosome.

Additional notes:

Unlike Family Tree DNA, AncestryDNA is not removing any SNPs from the data - medically relevant or not.
The overlap between AncestryDNA's raw data file and 23andMe's should be around 690,000 SNPs due to the fact that they are both using the same Illumina OmniExpress Plus base chip. The ~10,000 SNP difference can be accounted for due to a different set of poorly preforming probes and test SNPs. Family Tree DNA's should have a similar overlap for the same reasons.
There is no mitochondrial DNA included in the raw data file because it is not included on the Illumina chip that they are using. (23andMe adds the mtDNA SNPs).

Search Function
As I expected from earlier conversations with AncestryDNA, a search function is next on the list. Kenny Freestone, Product Manager for AncestryDNA, discussed it in his presentation under the heading "What's Next". Although it is already in the works, Kenny could not provide a firm timeline for its availability when I asked.

We will be able to filter our list of matches by surname, location and username. As anyone who has worked with their AncestryDNA matches knows, this is sorely needed. There is no doubt that the many requests from customers pushed this up their list of priorities.

Genetic Ethnicity Update
Later this year, AncestryDNA will be updating their Genetic Ethnicity feature. They will provide more granularity in Europe and West Africa. We can also expect more accurate breakdowns. A number of AncestryDNA personnel acknowledged to me over the weekend that certain "ethnicities" (i.e. - Scandinavian) are overestimated for many customers. However, they also emphasized that much of the perceived problem with their admixture analysis stems from the question of "where and when". What they mean by this is that it is very difficult (and sometimes impossible) to pinpoint where specific DNA signatures were at an exact time in history.

As I always remind my readers, this portion of the science has a long way to go and will improve with more data and time. On the "more data" front, during her speech at the AncestryDNA luncheon on Friday, Dr. Ball was reportedly requesting that genealogists who know that all eight of their great grandparents were born in the same place share this information with AncestryDNA. This seems to imply that, like 23andMe has successfully done, AncestryDNA plans to use customer data to improve their predictions. They are also starting to work on incorporating the coveted SMGF collection into their admixture analysis, which should improve it greatly.

The good news is that AncestryDNA customers don't have to wait for this update to gain more insight into their ancestral origins. Now that AncestryDNA has made the raw data available, customers will be able to upload their raw data file to the various third party sites to try out the admixture calculators and/or send it to Dr. McDonald for his very highly regarded analysis.

Matching
AncestryDNA are currently working on an algorithm to improve matching for endogamous populations, specifically Ashkenazi Jews.

As I reported in November, the minimum threshold for matching is 5 megabase pairs. This was reconfirmed in a conversation I had with Dr. Ball on Friday. I also learned that there is no minimum SNP requirement. We discussed the possibility of AncestryDNA switching to centiMorgan measurements in the future.

Price
The test is now $99 for everyone - subscribers and non-subscribers. This was likely in response to 23andMe's recent price drop. Having attracted well over 120,000 customers in less than a year in business, AncestryDNA is proving to be an important player in this field. This new policy to attract subscribers and non-subscribers alike will only improve their market share.

International Customers
It does not appear that AncestryDNA has plans to offer their test to international customers in the near future, instead choosing to focus on the U.S. market for now.

Matching Segment Data and Chromosome Browser
On Friday at RootsTech, Dr. Tim Janzen and I sat down for a meeting with AncestryDNA management. Among others, we were joined by Dr. Ken Chahine, Senior Vice President and General Manager of DNA, and Dr. Catherine Ball, Vice President of Genomics and BioInformatics. (Dave Dowell also attended a portion of the meeting.) I found them to be very receptive to hearing our requests and the reasons behind them. At no time did they state that they had decided not to build a chromosome browser or release matching segment data to their customers in the future. Dr. Ball did express some privacy concerns, but was open to hearing ideas of how this could be addressed.

Tim Janzen explains his feelings while Ken Chahine looks on

During the meeting, Tim very emphatically explained his feelings on the need for matching segment data (above) and I resorted to begging (below)... {hehe}

Catherine Ball, Ken Chahine, Tim Janzen, me, Dave Dowell and Steve Baloglu

On Saturday, after attending Kenny Freestone's presentation, four advanced genetic genealogists approached him to discuss the chromosome browser issue. In addition to myself, Tim Janzen, Angie Bush, and Nathan Machula were present for the conversation. Kenny didn't have much to say and mostly listened to the arguments that we presented covering why we feel that it is essential that AncestryDNA offer the matching segment data behind their relative predictions. At no time did he state that AncestryDNA would not offer a chromosome browser or that the delay in doing so was because AncestryDNA didn't think that their customers could understand it. He did, however, confirm that it was not a top priority at this time. He also said that he personally reads all of the requests sent through the feedback button, so if you want them to reassess their priorities, then be sure and let them know.

Tim emphasized that both 23andMe and Family Tree DNA included a chromosome browser feature at the launch of their autosomal DNA product and wondered aloud why AncestryDNA had not done so as well. I explained to Kenny (as well as in my meeting with management) that, as genealogists, we expect conclusions to be evidence based. It is not in line with this principle to simply be told that a certain common ancestor is responsible for a DNA match and be expected to take AncestryDNA's word for it. Where is the proof? Since Kenny had shown a chart during his presentation of his ancestral lines that he claimed were genetically confirmed by AncestryDNA matches, I also pointed out the fact that those lines that he had shaded in weren't really confirmed without the actual genetic data to support that claim. To illustrate, I laid out my experience as follows:

On my AncestryDNA account, I was happy to find a shaky leaf hint a few weeks ago.

Upon reviewing the match, I noted that the common ancestor was through my mother's side. I was initially excited to see that I had inherited DNA from my 7th great grandparents on paper, Joseph Denison and Prudence Miner.

The only problem is that this match doesn't appear anywhere on my mother's 47 pages of matches. Do you know what this means? It means that I must have inherited the DNA responsible for the match through my father's side. Since all DNA inherited through my mother's line must come through her, AncestryDNA has identified the wrong common ancestor as the source of the DNA shared between LGB and me. A fluke of the algorithms...? Perhaps. Let's look at some more of my matches.

Once again, as you can see, the common ancestor identified by AncestryDNA is on my mother's side. A thorough search of my mother's matches shows that, once again, this person is not reported as a match to my mother. From this, we can only reach the conclusion that the DNA responsible for this match comes through my father's side - not my mother's. The common ancestor that I share with "Baerion" must be beyond a brick wall in her family tree or on my paternal side. In general, I have had more success filling in the branches of the maternal side of my family tree than the paternal side, so this is certainly possible.

Just to demonstrate that this isn't an isolated occurence, here is another one:

This match doesn't appear on my mother's match list either! So, out of my ten matches that have shaky leaves attached, three of them apparently have common ancestors wrongly identified as the source of our matching DNA. Do you see the problem here? Does AncestryDNA? If this match were, instead, at 23andMe or Family Tree DNA, I could check the DNA segment that we share and compare it to my other matches and/or my chromosome map. This would provide additional information and/or evidence to help me determine through which of my ancestral lines this segment of DNA was inherited. Might there be other explanations for these discrepancies? It is certainly possible, but without the underlying genetic data, it is impossible to say.

I am in the fortunate position to have tested my mother at AncestryDNA in addition to myself, so I can clearly see there is an issue. What about all of those people who have not tested a parent and are blindly accepting AncestryDNA's shared ancestor hints because they don't know otherwise? Isn't that kind of like copying someone's tree and just taking their word for it that it is correct with no sources or evidence attached? For now, those of us who do understand the finer points of autosomal DNA matching will have to do our best to convince our matches to upload to Gedmatch so they can see for themselves what they are missing.

As much as I, too, am disappointed that AncestryDNA has not yet provided the matching segment data, it is clear to me that the reasons behind this decision are far more complex than what others may claim is an attempt to dumb down the product because Ancestry.com thinks its customers are stupid. From my many conversations with Ken Chahine and others from AncestryDNA over the past year, I have come to appreciate that working within the framework of this 1.6 billion dollar corporation comes with its own set of challenges.

The Future
Tim Sullivan, CEO, has made it clear that Ancestry.com is committed to the DNA business and Ken Chahine has always been upfront with me and come through with his promises. So, I am going to give them the benefit of the doubt. From our very first conversation, I have advocated for the genetic genealogy community and looked out for our best interests and I won't stop doing so. I believe that they will do the right thing for their customers and the genetic genealogy community eventually. It may not happen as quickly as we would all like (yesterday!), but they are not the big bad wolf and I think it does us all a disservice to continually paint their intentions in a negative light. We are in early days yet. Let's give them a break.

Friday, July 20, 2012

Known Relative Studies at FTDNA: Third Cousin Comparison and More Random atDNA Inheritance

I don't write about Family Finder very often for my known relative series since most of my close relatives have tested at 23andMe. Fortunately, my Travis third cousin recently decided to take the Family Finder test at Family Tree DNA. As a result, I have a new third cousin comparison to report.

Our only (known) common ancestors are our great great grandparents Abraham and Ruth (Stolebarger) Travis, so any matching DNA that we share is inherited from the Travises. Abraham's father Asa and Ruth's parents John and Sarah are two of my major genealogical brick walls, so it is really interesting to be able to isolate DNA that came from those lines.

My Travis third cousin and I share 45.96 total cM of DNA with a longest block of 14.84 cM. This is on the low end for third cousins and the most likely relationship suggested by FTDNA is actually fourth cousins. Only about 25 cM comes from segments longer than 5 cM, so just including those in my calculations (to keep it consistent with my 23andMe comparisons), that means we only share about .37% of our total DNA. Since third cousins are expected to share about .781% of DNA, this is a bit low, but it is in line with my other third cousin comparisons so far (averaging .39%). That's random atDNA inheritance for you!

Family Tree DNA offers a unique perspective on these comparisons, so I will share how this match looks on the different settings that are possible on their Chromosome Browser tool. In the chart below, the blue bars represent my twenty-two autosomal chromosome pairs. The orange bars are the sections where my Travis cousin and I have stretches of matching DNA. This chart is displaying the lowest setting in order to show all matching segments over 1 cM. Many of these are probably false positives, but it is still interesting to be able to see them.

My third cousin comparison showing all matching segments over 1 cM

The next image is set to show only matching segments over 3 cM. As you should be able to see, our main matches are on Chromosomes 2 and 14. The only other match that didn't drop off is the one on Chromosome 6. This match falls under the threshold of what you would see at 23andMe, so over there we wouldn't have known about it at all. Remember, this could be a "false positive" since a pretty large percentage of segments this size prove to be, but I will reserve judgment until I am able to compare it to my chromosome map (when it is more fully developed) to confirm if this segment falls in an area that I can positively attribute to my Travis ancestral line. That should help determine whether this is an authentic matching segment or not.

My third cousin comparison showing all matching segments over 3 cM

The final chart shows only the two largest matching segments. You can see them signified by the orange on Chromosomes 2 and 14. When you scroll over these spots, a window will open (as above) describing the exact starting and ending points of the matching segment(s).

Third cousin comparison showing only matching segments over 5 cM

To summarize, I compared a known third cousin to myself to identify the portions of our DNA that match each other. Due to the sizes, we can be confident that two of the matching segments are authentically inherited from our common ancestors. What this means is that I can now attribute those larger segments as originating with Abe and Ruth Travis.

Abe and Ruth Travis

If my Travis cousin were to upload his data to Gedmatch, I could compare him to my mother, sisters and some of my other cousins who have tested at 23andMe. It would be very interesting to see the variety of inheritance patterns. Hopefully, I will be able to do so in the future.

On another note, I actually have four matches on Family Finder who are predicted to be more closely related to me than my Travis third cousin. I have not found a common ancestor with any of these people mainly because, with the exception of one, they do not respond to my emails. It does make you wonder what would happen if everyone responded with great family trees, ready for comparison.

This should give some hope to those of you who are struggling to confirm common ancestors with your matches. We tend to focus on the larger matches, of course, but some of these seemingly lesser marches could still be quite significant. As I have often emphasized, autosomal DNA inheritance starts getting pretty unpredictable at about the third cousin level. This comparison is a good example of that because we wouldn't usually expect a .37% match to have such a (relatively) recent connection. So, take a closer look at your match lists and give it another go. You just might be surprised by what you find!

*Another third cousin comparison: I found my third cousin today at 23andMe!*

Wednesday, December 21, 2011

23andMe changes terms for expired PGS subscription customers

As my readers well know, I have long been an outspoken and dedicated advocate of 23andMe. I'm sure that many of you have become customers after reading my posts. I am very disappointed to have to report that today, for the first time, I was hesitant to recommend 23andMe to a person who contacted me for advice on DNA testing.

Apparently, 23andMe changed their terms of service for v3 customers without notice [Update - 23andMe states that the TOS were not changed, only the FAQs.] According to the new FAQ section under Personal Genome Service, customers who allow their subscription to lapse after the original commitment of 12 months will NOT have access to their Relative Finder matches, Health Reports and all other features that "rely on your genetic data". The new section reads:

If you cancel your subscription, you will no longer have access to the items listed below:

Access to hundreds of comprehensive reports that interpret your genetic data
Continual updates to those reports, based on the latest research discoveries
Ability to share and compare results with friends and family
Tools to discover new relatives and learn about your ancestry

We retain your raw genetic data within your 23andMe account, allowing you to download it at any time, even after you cancel your subscription.
You may reinstate your subscription at any time in the future.
If you cancel, you will be unable to share and compare results with friends and family.
If you cancel, people whom you are sharing genomes with will be unable to view your results or compare results with you. It will be the equivalent of you not having shared in the first place.
Canceling your subscription means you no longer have access to features that rely on your genetic data. Canceling has no effect on features that do not rely on your genetic data, such as user-to-user messaging, 23andMe Community, Family Health History, Research Surveys or Research Snippets.
We encourage you to continue using these features, even if you decide not to subscribe to the Personal Genome Service.

This is a direct contradiction to what I was told directly by a company representative (with the understanding that I would publish it) back in November 2010. Additionally, the old FAQ were clear that customers would retain access to their existing Relative Finder, but would not receive NEW matches after they let their subscription lapse. I published this info on my blog and this encouraged some of my readers to buy under these terms. From my post of November 24, 2010 (note: the following link is dead),
"...according to this new FAQ, it should be noted that new customers will no longer receive updates, including new Relative Finder matches, if they cancel their PGS subscription, however they will still have access to any existing reports, matches, features and their raw data."

The Terms of Service under Acceptance of Terms states,
You can accept the TOS by (1) clicking to accept or agree to the TOS, where this option is made available to you by 23andMe for any Service; or by (2) actually using the Services. In this case, you acknowledge and agree that 23andMe will treat your use of the Services as acceptance of the TOS from that point onwards. In addition, when using particular 23andMe Services, you shall be subject to any guidelines or rules applicable to such services that may be posted from time to time...

Apparently, by continuing to use their services, we have all agreed to this change even though we were never made aware of it.

I am hoping that 23andMe will rethink this new position. As is apparent on their Community Forums, even their staunchest supporters (who have been responsible for thousands of sales as well as spending endless hours hand holding new customers in the absence of good customer support) are rethinking their allegiance to 23andMe. Perhaps, this has all been in reaction to FTDNA's announcement that they will soon be allowing uploads of 23andMe v3 raw data to their Family Finder platform. If so, this action is only serving to alienate those who have shown loyalty and support to 23andMe and would have continued to do so. At least they are still allowing raw data downloads, since migrating to FTDNA and/or GEDmatch.com will be the best option for those who do not wish to pay a subscription fee for life.

This is terrible news for those of us who have spent an inordinate amount of time trying to make the most of the Relative Finder feature. I am very glad that I did not upgrade any of my v2 accounts, but I am concerned about the vast amount of information still waiting to be discovered that will soon be lost with my matches who choose not to renew. This decision on the part of 23andMe is honestly bewildering to me. Let's hope 23andMe soon makes a public statement clarifying their intentions.

[Update - 23andMe is willing to listen to our concerns. A company rep just posted this in the forums: Again, we are sorry for our poor communication about the subscription changes. We make mistakes, we're human- it's in our DNA. We want to assure you we are listening and we want to hear more. We've created a space for you to post your specific concerns so that we can be sure that you have a voice as we discuss these changes moving forward- http://bit.ly/uk6xqk
This form helps us more efficiently share your input with teams across the company.
I encourage all of my readers to let your voice be heard!]

[[Update - Please sign the new petition addressing this issue: http://www.yourgeneticgenealogist.com/2011/12/petition-asking-23andme-to-reconsider.html]]

Monday, October 31, 2011

Investigating a Long-Held Genealogical Theory Using DNA Evidence - Purdy

About a decade ago when I started my family history research in earnest, I was assisted by an extremely talented genealogist by the name of Karin Corbeil. I was amazed at her generosity and the amount of time and care that she invested in researching my family line. At the time, I couldn't understand why someone would volunteer to research a family that wasn't even their own and ask for nothing in return. That was before I came to know the community of genealogists and witnessed this kind of generosity over and over again.

Karin's husband descends from Nancy Purdy (1797-1878) and I descend from Daniel Purdy (1817-1897). Daniel is one of my most recent brickwalls. Karin had long theorized that both Nancy and Daniel were grandchildren of Joseph Purdy (c.1736-1818) whose family is found in 1803 in Haldimand Township, Ontario, Canada. Joseph Purdy was the son of Obadiah Purdy and Phoebe Underhill. If her theory is correct this would make her husband and I sixth cousins.

To all of our benefit, Karin has now ventured into the world of genetic genealogy and, with a mind like hers, she is extremely well suited to the endeavor. While scanning her husband's Relative Finder matches on 23andMe.com, she came across an anonymous match with familiar ancestral surnames listed (not Purdy). Due to her extensive work on the Purdy family trees, with only four surnames to go on, Karin was able to make an educated guess as to who the person was behind 23andMe's shield of anonymity! She then asked me if I also had this individual on my list of matches. I didn't, but I did find him in my paternal aunt's and uncle's Relative Finder by searching on one of the listed surnames. Since he hadn't answered her initial invitation through 23andMe, Karin tracked down an email address and sent a message to the suspected person behind the match. Sure enough, she was correct and, even better, he was willing to "share genomes" with both Karin and I to determine more details about our match.

Karin's husband and this individual, who are 4th cousins through their Purdy ancestral lines, share .35% of their DNA and were correctly predicted to be 4th cousins by 23andMe. Interestingly, this Purdy Cousin also matches my aunt and uncle, who would be his 5th cousins once removed if Karin's theory is correct, on .14% of their DNA and was predicted to be a 5th cousin. My sisters and myself, who would be his sixth cousins, do not show a match with him at all on 23andMe.

Purdy Cousin compared to me, my uncle (green) and my aunt (light blue)

We also investigated our matches on Gedmatch.com to determine if there might be some smaller matches that fell just below 23andMe's matching threshold. Since my father was tested at FTDNA instead of 23andMe, I was also able to compare his DNA to that of our Purdy Cousin using Gedmatch's tools.

On Gedmatch the following matching stretches of DNA were detected with our Purdy Cousin:

With Karin's Husband:
Chr Start Location End Location Centimorgans (cM) SNPs
1     88,186,177        94,337,025          5.6                   2,240
1     94,337,035       114,454,559        19.6                  7,219

With Me:
Chr Start Location End Location Centimorgans (cM)   SNPs
1       5,314,984       7,050,643            3.6                        503
8     136,313,206    138,665,940         3.9                        554
9     127,867,126    130,999,928         4.4                        747
12    24,589,880      26,489,079          3.3                        689
12   114,422,485     116,042,182        3.4                        516

With My Uncle:
Chr Start Location   End Location Centimorgans (cM) SNPs
1      206,237,168      215,353,110       12.3                   2,045

With My Dad:
Chr Start Location   End Location Centimorgans (cM) SNPs
1        5,044,926        7,029,550           4.0                      624
3       29,876,878      31,804,154          3.0                      587
10     73,415,605      78,397,973          3.7                      902
16     53,357,205      55,155,137          3.4                      576
16     78,611,267      80,033,433          3.5                      571

With My Aunt:
Chr Start Location End Location Centimorgans (cM) SNPs
1       206,246,175    215,354,364    12.3                       3,487
19       8,636,038        11,088,559     5.6                         809

Many of these segments are too small to be meaningful, especially the matches that I show on Chromosomes #8, 9 and 12 since they are not detected in my father's DNA. However, when looking at this data as a whole, it is apparent that there truly is a familial relationship between these individuals. Although we cannot be absolutely certain that the shared DNA that has been detected between them is from our Purdy ancestral lines, I feel comfortable that this data combined with Karin's many years of solid research shows very strong evidence in support of this conclusion. Karin writes that this is "... a huge breakthrough in the genealogical community for Purdy researchers. For many years now, I and a number of family researchers and historians have been trying to connect Nancy Purdy (wife of Harnden Eddy) to other Purdys in Ontario, Canada." In time and with more Purdy descendants in the database, I believe that we will be able to determine beyond a reasonable doubt if this DNA is indeed inherited from our presumed common ancestor Joseph Purdy.

On a side note, I recently discovered that Karin is adopted and searching for her birth family, which is astounding considering her genealogical prowess! She was born as Carol Lee Foley on July 22, 1945 in Brooklyn, New York. If anyone has any information that might help Karin find her birth family, please contact me and I will pass it on to her or visit her new blog. I am quite confident that with her research skills and the newly added valuable tool of DNA to her toolbox, Karin will soon solve this mystery. The family that she finds will be very lucky to have her. I wish her and all adoptees success in their searches. Everyone deserves to know who they are and, in this era of personal genomics, it is finally becoming possible.

[Update - Karin Corbeil has reunited with her birth family, in part, thanks to DNA testing.]

Tuesday, May 31, 2011

FTDNA's Family Finder: My New Illumina Chip Results and Investigating a "Close" Match

I finally received my updated Family Finder results with the new Illumina chip from FTDNA. There was a problem with my sample, so it took much longer than expected. I have been excitedly investigating my new and improved match list.

With FTDNA's original Affymetrix chip I had 54 total Family Finder matches. I now have 59 total matches with the new Illumina chip (including some new customers). Between the two chips, there are only 22 matches in common and 32 matches from my old list are gone completely. On the Affy chip I had three matches under the "Close to Immediate" match filter. I now have four using that filter on the Illumina chip. One is the same. One is a brand new participant. One was previously listed as 4th to distant and one was previously listed as 5th to distant. Of the old Affy "Close and Immediate" matches, one is now listed as a 4th to distant and one is completely gone from my match list.

I was surprised and happy to see the brand new "close" match. This is my first 3rd cousin prediction on Family Finder (range 2nd - 4th). We have 2 blocks of DNA in common. The one on Chromosome 5 is 17.67 cMs and on Chromosome 18 we share 5.63 cMs. My match agreed to upload her raw data file to GEDmatch, so I could compare her to my mother's 23andMe file. This allowed me to determine that they match on longer stretches of the same spots: Chr 5 = 21.9 cM and Chr 18= 9.4 cMs. I share ~ .315% of DNA with her and my mother shares ~.42%. Since a 3rd cousin will share on average .781% and a 4th cousin .195%, this one appears to fall somewhere around a 3rd cousin once removed. (See expected percentages here.)

Yellow bars are where I match my new predicted 3rd cousin

We confirmed that we do not have any second or third great grandparents in common, so the match must be further back than predicted. We haven't been able to figure out our exact connection, but we have narrowed down our match to a very specific area and branch of our trees. Her fifth great grandmother, "Catherine" married Johannes (John) Long. Some believe Catherine was a Kinard, but she has long been rumored to have been a Roderick. If so, this would mean that she is related to my third great grandmother Susannah Roderick, since all in the area with that surname were descended from the same immigrant Rothrock family. Catherine is the right age to be the sister of Susannah's father Daniel Roderick. If this is correct, then their parents Johann George Rothrock and Elizabeth Roemig would be my match's sixth great grandparents and my fifth great grandparents.

My match also theorized that Susannah's mother Elizabeth Long is the sister of Johannes Long who married Catherine. If indeed Johannes Long and Elizabeth Long are siblings, then their parents (unknown) are her 7th great grandparents and my 5th. This would make us double 6th cousins once and twice removed.

This double cousinship could account for the surprising amount of DNA that we have in common. Presently, this is just a theory but, certainly, our close match lends credence to Catherine, indeed, being a Roderick/Rothrock. These Pennsylvania Dutch families all lived in Bucks and Berks Counties, Pennsylvania, intermarried and traveled together to (Ross County) Ohio and Illinois. They may have even known each other back in Germany. Although, I am not totally convinced that we have determined our exact connection, I am confident that we have pinpointed the correct family branches and geography.

There is also a possibility that we have a connection through Susannah Roderick's husband, my brickwall Asa Travis. Since these families intermarried frequently, it is conceivable that his parents were also related to one of the Pennsylvania Dutch families traveling together. There is little doubt that multiple common ancestors and, likely, cousin intermarriages account for our "close" match. I will be keeping watch for overlapping matches that may help us to narrow down the surnames involved in these connections.

[Disclosure - My company StudioINTV has an existing production agreement with FTDNA that has no bearing on the opinions I express. I also receive a small commission from FTDNA on non-sale orders through my affiliate link, which I use to fund DNA tests. I receive no other compensation in relation to any of the companies or products referenced in my blog.]