Your Genetic Genealogist: Ken Chahine Answers My Questions and Reveals Behind-the-Scenes Information about AncestryDNA

Monday, November 5, 2012

Ken Chahine Answers My Questions and Reveals Behind-the-Scenes Information about AncestryDNA

I recently had a long and enlightening conversation with Dr. Ken Chahine, General Manager of AncestryDNA. Due to the lack of information from reliable sources regarding the details of what is going on behind the scenes at AncestryDNA, there has been considerable speculation among those interested in the product. As it turns out, much of this has been incorrect.

AncestryDNA's Matching Threshold
First, mostly due to the large number of matches, it has been widely speculated that AncestryDNA is allowing for a much lower threshold than either 23andMe or Family Tree DNA, reporting matches based on as little as two cM. In reality, Ken tells me that AncestryDNA has been using a 5 Mb cutoff [Mb = mega base pairs = 1,000,000 base pairs] for reporting matches in their lowest category - "very low confidence". He explains how they came to this decision and what AncestryDNA sees as the benefits to their customers:

AncestryDNA, we believe, is the only service that phases the genotyping data and has validated the matching algorithm with large pedigrees. That leads to two important differences. First, it allowed us to test various segment cutoffs from 5-10 Mb* with and without a proprietary filter that preferentially removes incorrect matches. We've initially selected the 5 Mb cutoff with the filter as providing the best balance between false negative (true matches that we fail to call a match) and false positives (false matches that we call true matches). Second, it allowed us to make a better cousinship prediction. For example, our data suggest that most relationships that are theoretically predicted to be third cousins are really fourth cousins or deeper. Therefore, a fourth cousin match at AncestryDNA, we believe, is a third cousin match at other services.

Ken's assertion that AncestryDNA is using a more conservative prediction calculation does appear to be in agreement with what I and many of my colleagues have observed. Time will tell if it is indeed more accurate. The filter aimed at reducing the number of IBS (Identical By State) matches sounds like a promising addition. When we have the ability to examine the raw data we should be able to reach conclusions about how effective the filter is at fulfilling its purpose.

Mb vs cM - What does it mean to us?
As you may know, the centimorgan rather than mega base pairs is used by Family Tree DNA's Family Finder and also primarily by 23andMe as the length of measurement for matching autosomal DNA segments. So, how does this 5 Mb threshold compare to the 5.5 cM* threshold used by Family Tree DNA (*edited from 7.7 cM after I was sent this) and the 7 cM threshold used by 23andMe in their Relative Finder feature? The National Institutes of Health website tells us that in human genetics, "one centimorgan is equivalent, on average, to one million base pairs" or 1 Mb. Genome.gov agrees, "Generally, one centimorgan equals about 1 million base pairs." However, in reviewing my Ancestry Finder download at 23andMe, which lists the length of segments both in Mb and in cM, I came to the conclusion that, unfortunately, it isn't that simple - at least for our purposes. In some cases, the numerical value in Mb was larger than in cM for the same segment, but in other cases it was smaller. I copied portions of my Ancestry Finder download to demonstrate examples. If you have tested at 23andMe, take a look at your own file to get a feel for the comparison.

The first chart shows the respective values when the Mb value was 11 and the second when the cM value was 11:

Notice the wide range of values in both directions. It appears to be impossible to find a direct correlation between segment lengths measured in Mb and cM. In some cases, the two values weren't even close:

The reason the number of base pairs that a centimorgan corresponds to varies so widely is because when the distance along a chromosome is measured in mega base pairs (Mb), the value strictly reflects how many millions of base pairs there are in a matching segment, but when using centimorgans (cM) to express the distance along the chromosome, the frequency or chance of recombination expected within that segment is being measured. Some portions of the genome are expected to recombine more often than others, therefore sometimes a segment of 1 Mb has a relatively good chance of remaining intact and sometimes it does not.

Different Predictions and More Matches
This difference between AncestryDNA's way of calculating the length of segments and that of the other two companies may explain, in part, the reason that some of us are seeing the same matches at AncestryDNA as we have at the other two companies, with very different predictions. The fact that AncestryDNA is using a phasing engine before running the matching algorithms will also account for some of the reported discrepancies. When asked why AncestryDNA is, on average, returning more matches than the other two companies, Ken offered one possible explanation. He said that it may be a result of the AncestryDNA database containing primarily customers with deep roots in the United States and, in many cases, descending from large Colonial New England families.

Adding International Customers to the Database
This discussion prompted me to inquire as to when AncestryDNA plans to offer the test to international customers. Ken said that it is certainly "on the radar", but they do not have an estimate of when this will happen yet. He explained some of the reasons for this:

1. Demand is still high within the United States and they are "processing samples as fast as we can right now".
2. The privacy laws in Europe are, in some cases, different than the US. Therefore, this will take additional time to address.
3. They will need to work out logistical issues.

He emphasized that Ancestry.com is a large company, which necessitates significant forethought and planning before taking action.

Uploadable Raw Data
During our conversation, Ken also addressed the questions surrounding the format the raw data will be presented in as well as the much-hoped-for matching segment data.

"We will be providing raw data download in early 2013. We have not made any formal decision on segment data. We understand that it is important to some of our customers and are taking it into serious consideration."

When I asked him whether the raw data would be formatted in such a way that will be compatible with uploading to third party sites such as Gedmatch.com, he assured me that it would. Fortunately, this puts to rest all of our speculation that the "related security enhancements" Ken referred to in his keynote address at the Consumer Genetics Conference last month would interfere with the data's usability. When I inquired further about the future availability of segment data, he said that he cannot promise anything in that regard, but was open to discussing what presentation formats of that data might be acceptable to genetic genealogists.

Admixture and Reasonable Assumptions
Some have also interpreted Ken's statement (reported by Esquire) that some customers are using their own knowledge to make reasonable assumptions that are leading to incorrect conclusions, to mean that AncestryDNA is using an altogether different method of determining our matches than the other two companies offering autosomal DNA tests for genealogy. He explained to me that what he was actually referring to was that many customers are assuming that because autosomal DNA matching is only applicable to relatively recent ancestry, that the admixture results also reflect that time period and should match what we know of our ancestral origins from our family trees. He emphasized that AncestryDNA's "Genetic Ethnicity" feature (like any admixture tool) is not looking at the large segments used for relative matching, but rather is examining much smaller blocks and single markers that are ancestrally informative. Therefore, some of this admixture is very old - offering a glimpse much further back in time than our known family trees. He offered reassurance to those who feel that this portion of the test is not yet as accurate as it should be (me included):

AncestryDNA is data-driven. Our team of scientists are constantly analyzing the data looking for ways to improve the ethnicity and matching prediction algorithms. The science, and hence the customer experience, is only going to improve with time.

At some point, the Sorenson data will likely be incorporated into the AncestryDNA test, which should improve the admixture predictions tremendously.

Even the CEO is working with his matches!
In closing, it was very nice to hear that "everyone from the CEO down is working with their matches" at Ancestry.com. This should lead to a management team that is educated about what we as genetic genealogists are trying to accomplish and how best to do it. As a result, I look forward to improved tools and results at AncestryDNA in the future.

I want to thank Ken Chahine and Stephen Baloglu for their recent efforts to shed some light on aspects of the AncestryDNA test and clear up some of the confusion. As I told both of them, the more transparency that AncestryDNA can offer to the genetic genealogy community, the more satisfied we will all be with the product. According to Ken, it is likely that more details will be revealed soon. That is a very good thing because as I was writing this, I thought of many more questions for him!

19 comments:

AnonymousNovember 5, 2012 at 3:17 PM
FWIW, my results from Ancestry.com DNA are downright ridiculous. I've had FTDNA done. I've had 23andMe sequencing done. I've got a well-worked out genealogy. Ancestry.com seems to think that my ancestors are from Scandinavia, Italy, and Turkey. Three locations for which there is precisely zero other evidence for. Meanwhile they totally miss the UK, German, and Swiss ancestry that everything else points toward.

My confidence in Ancestry.com's DNA capabilities is very low at this point.

(note: this is only for the Genetic Ethnicity reporting. They have managed to get my Y DNA sequence correct. It matches the rest.)
ReplyDelete
Replies
KellyNovember 5, 2012 at 3:37 PM
Excellent info as always. many thanks.
Kelly
ReplyDelete
Replies
AnonymousNovember 5, 2012 at 7:56 PM
Your assertion is glib and you should educate your readers with a hyperlink to more details versus leaving us with vagueness.

"Some portions of the genome are expected to recombine more often than others, therefore sometimes a segment of 1 Mb has a relatively good chance of remaining intact and sometimes it does not."

So, are you saying this scenario is true when trying to ID true IBD decent segments over true IBS segments:

Scenario 1. As a general rule Some segments between Chr 1 to Chr 22 recombine more robustly and often? For example, Often times Chr 2 recombines less overall than the other Chr.

If there are other scenarios ... please share with us.
ReplyDelete
Replies
AnonymousNovember 5, 2012 at 8:14 PM
CeCe you let AncestryDNA and ken Chahine off far to easy about their delay and foot dragging in releasing raw data files just as Geno 2.0 dies, just as FTDNA does, just as 23andMe does.

Early 2013 could be up until June 20, 2013 or about 8 months from now and then they will come up with same lame excuse of delaying it again.

"We will be providing raw data download in early 2013. We have not made any formal decision on segment data. We understand that it is important to some of our customers and are taking it into serious consideration."

If you were a "true and respected" consumer advocate for Genetic Genealogy customers, you should advised Ken you were going to propose customers take their business someplace else and essentially boycotting AncestryDNA.

Their is ample reporting (videos) at government meetings where Ken is opposed to releasing raw data to consumers for one reason or another. Fact Check what I am saying and draw your own conclusions.

In the end, I say boycott AncestryDNA until they release raw data that "belongs" to customers.
ReplyDelete
Replies
DJKnox66November 5, 2012 at 10:09 PM
Hi CeCe - I have mixed feelings on how to respond to your blog. On one hand, you are putting yourself out there and it is so easy to criticize people who do that... so I applaude your efforts. On the other hand, I think this entire industry is full of "smoke & mirrors" not just at Ancestry but also at the other commercial players including FTDNA. So I think there are many who feel that those who get the chance to interface with executive representatives of these commercial providers should really act a little more critical. Ken's answers do appear vague and not very helpful because few commitments are being offered. For example, not offering this test to international clients is a HUGE BLUNDER... not everyone is interested in being limited Colonial England relative results. For any genetic genealogy to be successful to Americans (yet alone others), the database desperately needs foreign samples.

Anyway, good luck with making everyone happy lol!
ReplyDelete
Replies
Tim JanzenNovember 5, 2012 at 10:53 PM
Dear Anonymous,
There is a map at http://genome.cshlp.org/content/suppl/2009/09/22/gr.092676.109.DC1/gr-supplement.pdf that shows the variation by chromosome where the cMs are plotted against the Mbs. Note that the slope of each curve varies considerably depending on the location on any once chromosome that you are looking at. You might want to review that graph so that you can see which sections of the genome recombine more readily than others do.
Sincerely,
Tim Janzen
ReplyDelete
Replies
dadNovember 6, 2012 at 5:09 AM
Dear Cece,

Great work, thank you for this information. Let me ask the practical question on autosomal DNA and the Ancestry "confidence" scale. the great majority of the matches I see at the "Very Low Confidence" level seem totally unconnected. However, a few, maybe 5% seem incredibly informative -- exactly the expected family, time etc. How should I understand this? Are such "VL confidence" hits something I should value in making decisions about which conjecture is right, etc?

So another phrasing is this: is the VL confidence a statement that the match is probably wrong or if it picks out the right families (which seems extremely a priori improbable) these specific VLC matches are probably meaningful?!

Cheers, Dave Drabold
ReplyDelete
Replies
Debbie KennettNovember 6, 2012 at 8:25 AM
Thank you as always for your excellent blog post and for keeping us all informed. Perhaps those people who have criticised you, and especially those who are hiding under the cloak of anonymity, might like to write their own blog posts instead. They might then appreciate what a wonderful job you are doing and how much time it takes. I still wonder about Ancestry's very low confidence matches. I match one person at Ancestry who has also tested at 23andMe. He does not show up at 23andMe, even with the low threshold for the Ancestry Finder tool.
ReplyDelete
Replies
UnknownNovember 6, 2012 at 9:39 AM
I admit right off that I'm a genetic genealogy newbie. I had the opportunity to participate in the AncestryDNA beta and was upset and confused when I showed no Scandinavian in my admixture. Based on my pedigree I am 1/8 Norwegian. Since receiving my beta test results, I received an AncestryDNA invite and tested my grandmother who is 1/2 Norwegian. Thankfully her results showed 43% Scandinavian and 7% Finnish/Volga-Ural which is much more what I would expect. Her test restored some of my confidence in the AncestryDNA results.

My understanding is that each individual gets 50% of their DNA from each parent but it is a random selection. So by my basic understanding I could only hope to at best have gotten 1/8 of my DNA from my full-Norwegian great-grandmother. With the randomization at each generation, I assume I somehow did not get any recognizable Scandinavian segments and the other European gene segments won out in the random assignment in the generations. I do look forward to the raw data download to do my own comparison of what segments came from which lines. I also accept the fact that the admixture results are a moving target as the sampling improves on the ethnicity baselines and such.

Thank you CeCe for your objective report and continuing efforts on behalf of the community. I'm sure every company has its flaws and presents itself in the best light possible when interviewed or making public statements. I do not begrudge them for that, nor do I begrudge you for not being a surrogate for those who would wish to constantly be picky and negative until every demand is satisfied. There is a healthy balance that I seem to sense you working towards and I'm grateful for the competition in this industry which will hopefully bring us all better products, services, and tools.
ReplyDelete
Replies
AnonymousDecember 16, 2012 at 3:17 PM
Cece,

I wanted to ask you a question since you are knowledgeable about DNA testing. I took the AncestryDNA test and my ethnicity results are 80% British Isles, 12% Eastern European, 6% Finnish/Volga Ural, & 2% Unknown. I have several matches where we have common ancestors on paper that are Dutch and lived in the New Netherlands settlement. However, my matches & I do not have any Central European showing up in our ethnicity results. Does that mean that neither of us received DNA from these Dutch ancestors and that we must be related in a different way as well? Or does the test look at different areas of our DNA for the ethnicity test than it does to match us to other members? Would it be possible that we are related to each other through the Dutch ancestors? Thank you for your time!

Faith
ReplyDelete
Replies
UnknownAugust 6, 2013 at 5:39 PM
What determines skin color? I came up out of Africa? with black skin and I feel a kinship with all Black People, red and yellow, black and white.
How did I become White with Blond hair and blue eyes?
Neanderthal? and not really "out of Africa"?
Just wondering?
I have tested Ancestry.com DNA, autosomal, and FTdna, R1b1a2-M269,L23,( M343, L-268, Geno Project conversion,)and autosomal (Ancestry results- 56%Scandinavian,35% Central European,7%British Isles and 2% uncertain.
I joined a project, with FT and resulted in no matches in the entire database. Geno Project is very informative as to migration patterns, etc. but all is in "Beta" stage, I think? and that means infancy as to scientific research.
I have engaged in research of my lines for 35 years plus and I find that my paper genealogy is closely matched to my testing results.
Overall, I am very happy with the results and understand their limitations at present.
I do know that as databases grow, and more research is performed that I will become more and more informed and educated so that the results of testing will be more coherent.
Good work to 23and me, Ancestry.com, and all of the testing companies who strive to inform people as to "Who am I and Where do I come from"?
Richard Bittle
ReplyDelete
Replies

Add comment