Admixture and the "Scandinavian" Question
I have been researching the apparent overestimation of the Scandinavian component in AncestryDNA's admixture tool for some time. Like Blaine Bettinger of The Genetic Genealogist, I have been contacted by many people who were surprised by a significant prediction of Scandinavian in their "Genetic Ethnicity" results that is in conflict with their known family history. I have also read the same confusion expressed on many blogs when the testers' results didn't reflect what they know of their family tree. My own results in this area were a bit surprising as well. There are too many examples of this to simply ignore it or to explain it away by non-paternity events (NPEs) or bad genealogies. Blaine Bettinger has already done a thorough job on his recent blog post (Problems with AncestryDNA's Genetic Ethnicity Prediction?) explaining why our admixture results legitimately may not align with our known genealogies, so I won't go too deeply into that at the moment. I highly suggest you read his post for background if you haven't already.
To understand the perceived problem, let's look at my results as an example:
My "Genetic Ethnicity" Prediction - Click to enlarge |
Although I know that approximately 12.5% of my DNA originated from Scandinavia courtesy of my Norwegian great grandmother Fredrikka Herstad, I did not expect the 57% Scandinavian admixture prediction. Of further concern, I am 25% Finnish on paper and am only predicted as 7% Finnish by AncestryDNA's Genetic Ethnicity feature. For a person who doesn't know much about Finnish DNA, they may assume that my Finnish DNA is simply bleeding into the Scandinavian percentage, however Finnish DNA is quite unique and clusters distinctly separate from the Scandinavian countries (and all others). Although I may have a small amount of Swedish DNA from early migration to Finland, I am quite sure that it is not a significant amount based on my documented genealogy and, especially, my extensive research at both 23andMe and Family Tree DNA where I have myriads of Finnish "cousins" and very, very few Swedish ones. One could argue that these "ethnicity" predictions reach further back than autosomal DNA cousin matching, but I have seen how my Finnish DNA clusters using other admixture tools and am quite confident in my assessment.
Fortunately, there has already been some improvement in AncestryDNA's algorithm as to how it affects my personal admixture results. When I first received my beta results they revealed no British Isles component at all, which definitely gave me pause since my great grandfather George Allen was 100% British. Since then, my percentages have changed, as AncestryDNA warned us they might. My current 28% British Isles is much closer to what I would expect, without taking into account my substantial Colonial American ancestry. I expect their admixture algorithms will continue to improve.
My mother's case is even clearer. Below is her "Genetic Ethnicity" prediction. Although she is a full 50% Finnish, she is only predicted to possess 23% Finnish DNA. The 62% Scandinavian is especially surprising since my mother has no known Scandinavian ancestry, but she does have significant Colonial American ancestry, predominantly originating from the British Isles, of which her results show none. The Central European component fits well with her known Pennsylvania Dutch great grandmother Ruth Stoalabarger. From this it appears that all of my mother's British Isles and about half of her Finnish is somehow being interpreted as "Scandinavian".
My mother's "Genetic Ethnicity" Prediction - Click to Enlarge |
I spoke to Ken Chahine, General Manager for AncestryDNA, on Thursday to try to get an educated perspective on what might be happening rather than jumping to conclusions. Ken was aware of the controversy surrounding the admixture results, but told me that he feels more confident about their predictions since talking to Sir Walter Bodmer and Peter Donnelly of the "People of the British Isles" Wellcome Trust project at the University of Oxford. He said that they shared information with the AncestryDNA team suggesting that they too are finding a much larger Scandinavian component in the British Isles than expected. Some of their findings can be seen in the recent article here. According to Ken, these discoveries support the notion that the British Isles was a true melting pot long before the United States earned that moniker and further suggests that there was lots of migration between the British Isles and Norway, Sweden and Denmark. Ken feels that these conclusions support AncestryDNA's findings of significant Scandinavian admixture in many who would expect a more substantial British Isles component, like myself. I still have to wonder though if the "Scandinavian" label is a bit of a misnomer. I guess the question is: If the DNA has been in the British Isles for thousands of years, is it still "Scandinavian"? It is impossible to fully address this issue without access to the underlying genetic data on which AncestryDNA's analysis is based.
Further addressing this issue, Ken wanted to remind all of us that our ancestors' DNA can be diluted pretty quickly. Genetic drift comes into play, so we cannot expect our Family Tree to exactly resemble our admixture results. A good explanation of this can be found in Blaine's blog post Everyone Has Two Family Trees - A Genealogical Tree and a Genetic Tree. This concept really only starts to come into play five or more generations deep in our family trees and shouldn't have a substantial effect until we reach deeper than the great great great grandparent level.
Throughout our discussion, Ken emphasized that the exact science of admixture prediction is a "very tough problem to solve". I think we can all agree with him on this point. Anyone who has worked with admixture predictions knows that all of these tools are still rough and will benefit greatly from the increasing availability of reference samples. Ken told me that AncestryDNA now has seven PhDs on staff working on this "problem" - all with computational and mathematics expertise. Especially intriguing is the AncestryDNA team's emphasis on identifying new Ancestry Informative Markers (AIMs) with the goal of ultimately identifying rare alleles (frequency of less than 1% of the population), rather than focusing on long stretches of DNA.
"Finding these rare alleles will completely transform genetic genealogy." - Ken Chahine
I have to admit that Ken did make me feel a lot better about the direction that AncestryDNA is going with their new autosomal DNA test. He talked at great length about their goal to identify rare alleles from specific areas of the world. Ken's opinion is that the current Illumina chip in use by all three of the companies is not optimal for admixture analysis because the chosen SNPs are too common. According to Ken, for inclusion on the Illumina chips, SNPs are chosen that can be found in at least 5% of the population and are primarily geared toward health-related genes. In contrast, he explained that the real goal should be to find alleles that are RARE and not found in the general population (less than 1%), eventually enabling us to identify these alleles as originating in specific geographic areas and thus pinpointing the ancestral origins of those who possess this marker. (Obviously, full sequencing will provide the eventual solution.) Ken said his team seeks to use alternative means of discovery (other than chip technology). I was impressed to hear that the AncestryDNA team researches "every single day" for these new AIMs.
Other Points of Note
1. Every day AncestryDNA is updating their algorithms. It is a "living" tool, ever-changing with new input daily. With more plentiful, unique samples and better techniques the admixture and IBD predictions will continually improve.
2. The AncestryDNA team is trying to discover how shared segments relate to where the common ancestor is on our family trees. (I have also wondered if there are any identifiable patterns imbedded in "random" autosomal DNA inheritance.)
3. They are only starting to utilize the Sorenson data now. Their current admixture tool is based on public data.
4. All data is 100% phased before analysis. That is their first step at AncestryDNA. Ken said that it never occurred to him not to phase the data and that he was surprised that the other companies were not doing this as well. He explained that the computational requirements are enormous due to the phasing and that they looked at three or four different phasing engines - and found that all were roughly comparable.
The Future
There are still no answers regarding whether/when AncestryDNA will be providing the underlying genetic data to their customers, i.e. - specific matching segments or downloadable raw data, but Ken assured me that they are working on it. He said that they are still in the decision-making process and are currently evaluating the best way to deliver this data. He strongly dispelled the rumors that I had heard that AncestryDNA has decided that they will not provide the raw data to customers. It is indeed essential that this happens because blind faith has no place in a scientific environment such as this. Both Family Tree DNA and 23andMe release the genetic data, in line with their stated belief that all personal genetic information is the property of the individual, thus allowing for outside analysis and intellectual challenge. How can anyone disagree with this?
This conversation with Ken was an exhilarating one in regard to what is in store for us genetic genealogists. He possesses a promising vision for the future of genetic genealogy. The potential for strides in genetic genealogy is mind-blowing with a company pouring these kinds of resources into its advancement and working toward a well-defined, singular goal. This begs certain important questions: What will happen to the competitors if they are not able to match this level of financial commitment? Will AncestryDNA commit to an "open access" model that enables the citizen scientist to share in the new discoveries made, following in the footsteps of Family Tree DNA and 23andMe? Will the increased competition in this sector lead to leaps forward for all? Time will tell, but I can guarantee that the coming months will be very exciting ones in the world of genetic genealogy.
[10/26/12: This test is now out of Beta, so you can order it here.]
[5/19/13: I have noticed that many people who are looking for information about the AncestrybyDNA test are coming here. This is not the same company or test. For more on this, please read this post.]
Disclosure: I received both of the tests discussed above complimentary from AncestryDNA during their early beta testing phase.