Saturday, June 30, 2012

An Important Update on SMGF from Dr. Tim Janzen


Dr. Tim Janzen generously shared some very important information with me regarding the future of Sorenson Molecular Genealogy Foundation (SMGF). With his kind permission, I have decided to share this timely update with my readers in Tim's own words, as follows:
 

Since early May 2012 it has been officially known that Ancestry.com is purchasing the genetic genealogy company GeneTree. At that same time it was announced that Ancestry.com had also acquired the DNA related assets from the Sorenson Molecular Genealogy Foundation. (See this post for background regarding that announcement.) Exactly how this would impact the SMGF has been quite unclear since the announcement was made. The origin of the SMGF dates back to 1999 when the billionaire James Sorensen and the geneticist Scott Woodward established the foundation. 

The vision of the SMGF and the promises it made to the genetic genealogy community are summarized here. I will quote some of the content from that web page as follows:  

"The Sorenson Molecular Genealogy Foundation is a non-profit organization dedicated to building the world's foremost collection of DNA and corresponding genealogical information. The Foundation is a world leader in DNA research with direct application to genealogy. SMGF was inspired by discussions in 1999 between philanthropist James LeVoy Sorenson and BYU Professor Scott Woodward about using DNA in genealogy. Since that time, SMGF has collected more than 100,000 DNA samples, together with four-generation pedigree charts, from volunteers in more than 100 countries around the world. Y-chromosome DNA results and pedigree charts are available for searching in the Sorenson Database. Y-DNA results and pedigrees help you trace your direct paternal line. Mitochondrial DNA results and pedigree charts are available for searching in the Sorenson Database. Mitochondrial DNA results and pedigrees help you trace your direct maternal line. The Foundation is also conducting research on autosomal DNA, and plans to release the Sorenson Autosomal Database in the near future. Autosomal DNA results and pedigrees can help you trace all family lines. We invite you to participate in the SMGF project and search the Sorenson Database. Help grow the genetic family tree, one branch at a time." 

The SMGF was heavily funded by James Sorenson until his death in 2008. After that time the SMGF appears to have significantly slowed its testing of the samples sent to the foundation and the staff has also been reduced substantially. At its peak the SMGF employed about 40 full-time genealogists working on entering the information contributed on pedigree charts into its integrated genealogy database, which had 2,694,224 unique ancestors in it at the time the database was last updated in March 2011. The SMGF stopped offering free kits from its web site over 2 years ago. Additional information about the SMGF may be found here.

On June 2, 2012, Steve Perkins blogged that he queried the SMGF about the status of their databases. The response he received from the SMFG may be found on his blog here.

In that correspondence, SMGF indicated that they would not be releasing the autosomal DNA data that they have accumulated since the foundation started. As of August 2009 the SMGF had tested 78,568 samples for an average of 68.5 autosomal STRs and had tested 73,394 samples for 13 X chromosome STRs. This information has never been made public. 

I have been heavily involved in collecting samples for the SMGF on a volunteer basis since 2006. I have either directly or indirectly collected about 2000 samples and associated pedigree charts that I have sent to the SMGF. About 85% of those samples were from people of Low German Mennonite ancestry. The available Y chromosome data from those Mennonite samples is available here. I have been looking forward to seeing the release of the SMGF's autosomal and X chromosome databases for many years. Now it would appear that this data won't be released anytime in the near future. 

Due to concerns I had about the future of the SMGF, I e-mailed my contact Ali Nelson there on May 31. I received her response on June 5 as below:

"Due to Ancestry.com's acquisition of GeneTree and SMGF, we are no longer updating the database. That will be up to Ancestry to decide how and when to do updates and I don't have any more information than that. SMGF is also not running any more samples - everything now belongs to Ancestry so it is up to them to run the remaining samples. We are winding things down here in the office and actually will not exist for much longer so I don't see any reason for you to send us an updated copy of the Grandma database. All of our assets now belong to Ancestry.com and they are making all of the decisions from here on out. You can contact Ancestry.com directly and hopefully they can give you more details. They have an email address devoted to questions regarding SMGF so you can try contacting them there first - it is smgfsupport@ancestry.com."

On June 21 Ali Nelson sent additional information as follows: 

"In regard to the SMGF website, it is still going to remain functional.  In fact, we will be doing an update in the next week or two so a lot of new samples will go online."  

Ali also said that the SMGF will be officially closing its doors on or about July 13.

The information provided by Ali Nelson in her responses are disappointing in a number of ways as follows:
1. She confirmed that SMGF is closing its doors and will not be operating much longer.
2. The SMGF's Y chromosome and mitochondrial DNA databases will be updated within the next several weeks for the very last time.  When and if Ancestry.com decides to do additional updates to the SMGF databases remains to be seen.
3. It is unclear when, if ever, additional testing of samples currently at the SMGF will be carried out.  There are thousands of samples at the SMGF for whom no Y chromosome or mitochondrial DNA results have been released.

On the positive side, the SMGF will soon be releasing new data for many people whose data hasn't previously been released. It is also good that the SMGF DNA samples are still safe and sound and are under the watchful care of a large relatively stable company. Hopefully Ancestry.com will provide funds for testing the samples for which no Y chromosome or mitochondrial DNA results are available and will also update the SMGF Y chromosome and mitochondrial DNA databases from time to time. I would also be grateful if Ancestry.com eventually releases the SMGF's autosomal and X chromosome databases. While it is true that the use of large autosomal SNP arrays has largely supplanted the testing of autosomal STRs, the release of the SMGF's autosomal STR database would be of significant interest and benefit to the genetic genealogy community. In my opinion the demise of the SMGF is a significant loss to the entire genetic genealogy community. The SMGF had an altruistic vision to create an integrated genealogy database with correlated genetic data that would be accessible to the general public. I remain hopeful that something good will come from this and that Ancestry.com will pick up the mission where the SMGF left off.


[*Update 7/11/12 - An email from SMGF]

Tim Janzen is a family practice physician in Portland, Oregon and a highly respected genetic genealogist. He serves as one of six 23andMe Ancestry Ambassadors as well as on the ISOGG Y-DNA Haplogroup Tree committee. He is a leading researcher in the use of  autosomal DNA for genealogical purposes.

Tuesday, June 26, 2012

My Review of AncestryDNA's Admixture Tool and a Glimpse into the Future of Genetic Genealogy

I have waited to review my experience with my personal AncestryDNA Beta test until I got a good grasp of how the process is working.  So far, I can say that I think there is a lot of promise in AncestryDNA's approach to autosomal DNA analysis, but there are also things that have caused some confusion and concern among the community. I will address just one component of the test today and follow-up with another post to keep this review from becoming unwieldy.

Admixture and the "Scandinavian" Question 

I have been researching the apparent overestimation of the Scandinavian component in AncestryDNA's admixture tool for some time. Like Blaine Bettinger of The Genetic Genealogist, I have been contacted by many people who were surprised by a significant prediction of Scandinavian in their "Genetic Ethnicity" results that is in conflict with their known family history. I have also read the same confusion expressed on many blogs when the testers' results didn't reflect what they know of their family tree. My own results in this area were a bit surprising as well. There are too many examples of this to simply ignore it or to explain it away by non-paternity events (NPEs) or bad genealogies. Blaine Bettinger has already done a thorough job on his recent blog post (Problems with AncestryDNA's Genetic Ethnicity Prediction?) explaining why our admixture results legitimately may not align with our known genealogies, so I won't go too deeply into that at the moment. I highly suggest you read his post for background if you haven't already.

To understand the  perceived problem, let's look at my results as an example:

My "Genetic Ethnicity" Prediction - Click to enlarge

Although I know that approximately 12.5% of my DNA originated from Scandinavia courtesy of my Norwegian great grandmother Fredrikka Herstad, I did not expect the 57% Scandinavian admixture prediction. Of further concern,  I am 25% Finnish on paper and am only predicted as 7% Finnish by AncestryDNA's Genetic Ethnicity feature. For a person who doesn't know much about Finnish DNA, they may assume that my Finnish DNA is simply bleeding into the Scandinavian percentage, however Finnish DNA is quite unique and clusters distinctly separate from the Scandinavian countries (and all others). Although I may have a small amount of Swedish DNA from early migration to Finland, I am quite sure that it is not a significant amount based on my documented genealogy and, especially, my extensive research at both 23andMe and Family Tree DNA where I have myriads of Finnish "cousins" and very, very few Swedish ones. One could argue that these "ethnicity" predictions reach further back than autosomal DNA cousin matching, but I have seen how my Finnish DNA clusters using other admixture tools and am quite confident in my assessment.

Fortunately, there has already been some improvement in AncestryDNA's algorithm as to how it affects my personal admixture results. When I first received my beta results they revealed no British Isles component at all, which definitely gave me pause since my great grandfather George Allen was 100% British. Since then, my percentages have changed, as AncestryDNA warned us they might. My current 28% British Isles is much closer to what I would expect, without taking into account my substantial Colonial American ancestry. I expect their admixture algorithms will continue to improve.

My mother's case is even clearer. Below is her "Genetic Ethnicity" prediction.  Although she is a full 50% Finnish, she is only predicted to possess 23% Finnish DNA. The 62% Scandinavian is especially surprising since my mother has no known Scandinavian ancestry, but she does have significant Colonial American ancestry, predominantly originating from the British Isles, of which her results show none. The Central European component fits well with her known Pennsylvania Dutch great grandmother Ruth Stoalabarger. From this it appears that all of my mother's British Isles and about half of her Finnish is somehow being interpreted as "Scandinavian".

My mother's "Genetic Ethnicity" Prediction - Click to Enlarge

I spoke to Ken Chahine, General Manager for AncestryDNA, on Thursday to try to get an educated perspective on what might be happening rather than jumping to conclusions. Ken was aware of the controversy surrounding the admixture results, but told me that he feels more confident about their predictions since talking to Sir Walter Bodmer and Peter Donnelly of the "People of the British Isles" Wellcome Trust project at the University of Oxford.  He said that they shared information with the AncestryDNA team suggesting that they too are finding a much larger Scandinavian component in the British Isles than expected. Some of their findings can be seen in the recent article here. According to Ken, these discoveries support the notion that the British Isles was a true melting pot long before the United States earned that moniker and further suggests that there was lots of migration between the British Isles and Norway, Sweden and Denmark. Ken feels that these conclusions support AncestryDNA's findings of significant Scandinavian admixture in many who would expect a more substantial British Isles component, like myself. I still have to wonder though if the "Scandinavian" label is a bit of a misnomer. I guess the question is: If the DNA has been in the British Isles for thousands of years, is it still "Scandinavian"? It is impossible to fully address this issue without access to the underlying genetic data on which AncestryDNA's analysis is based.

Further addressing this issue, Ken wanted to remind all of us that our ancestors' DNA can be diluted pretty quickly. Genetic drift comes into play, so we cannot expect our Family Tree to exactly resemble our admixture results. A good explanation of this can be found in Blaine's blog post Everyone Has Two Family Trees - A Genealogical Tree and a Genetic Tree. This concept really only starts to come into play five or more generations deep in our family trees and shouldn't have a substantial effect until we reach deeper than the great great great grandparent level.

Throughout our discussion, Ken emphasized that the exact science of admixture prediction is a "very tough problem to solve". I think we can all agree with him on this point. Anyone who has worked with admixture predictions knows that all of these tools are still rough and will benefit greatly from the increasing availability of reference samples. Ken told me that AncestryDNA now has seven PhDs on staff working on this "problem" - all with computational and mathematics expertise. Especially intriguing is the AncestryDNA team's emphasis on identifying new Ancestry Informative Markers (AIMs) with the goal of ultimately identifying rare alleles (frequency of less than 1% of the population), rather than focusing on long stretches of DNA.

 "Finding these rare alleles will completely transform genetic genealogy." - Ken Chahine

I have to admit that Ken did make me feel a lot better about the direction that AncestryDNA is going with their new autosomal DNA test. He talked at great length about their goal to identify rare alleles from specific areas of the world. Ken's opinion is that the current Illumina chip in use by all three of the companies is not optimal for admixture analysis because the chosen SNPs are too common. According to Ken, for inclusion on the Illumina chips, SNPs are chosen that can be found in at least 5% of the population and are primarily geared toward health-related genes. In contrast, he explained that the real goal should be to find alleles that are RARE and not found in the general population (less than 1%), eventually enabling us to identify these alleles as originating in specific geographic areas and thus pinpointing the ancestral origins of those who possess this marker. (Obviously, full sequencing will provide the eventual solution.)  Ken said his team seeks to use alternative means of discovery (other than chip technology). I was impressed to hear that the AncestryDNA team researches "every single day" for these new AIMs.

Other Points of Note

1. Every day AncestryDNA is updating their algorithms. It is a "living" tool, ever-changing with new input daily. With more plentiful, unique samples and better techniques the admixture and IBD predictions will continually improve.

2. The AncestryDNA team is trying to discover how shared segments relate to where the common ancestor is on our family trees. (I have also wondered if there are any identifiable patterns imbedded in "random" autosomal DNA inheritance.)

3. They are only starting to utilize the Sorenson data now. Their current admixture tool is based on public data.

 4. All data is 100% phased before analysis. That is their first step at AncestryDNA. Ken said that it never occurred to him not to phase the data and that he was surprised that the other companies were not doing this as well. He explained that the computational requirements are enormous due to the phasing and that they looked at three or four different phasing engines - and found that all were roughly comparable.

The Future

There are still no answers regarding whether/when AncestryDNA will be providing the underlying genetic data to their customers, i.e. - specific matching segments or downloadable raw data, but Ken assured me that they are working on it. He said that they are still in the decision-making process and are currently evaluating the best way to deliver this data. He strongly dispelled the rumors that I had heard that AncestryDNA has decided that they will not provide the raw data to customers. It is indeed essential that this happens because blind faith has no place in a scientific environment such as this. Both Family Tree DNA and 23andMe release the genetic data, in line with their stated belief that all personal genetic information is the property of the individual, thus allowing for outside analysis and intellectual challenge. How can anyone disagree with this?

This conversation with Ken was an exhilarating one in regard to what is in store for us genetic genealogists. He possesses a promising vision for the future of genetic genealogy. The potential for strides in genetic genealogy is mind-blowing with a company pouring these kinds of resources into its advancement and working toward a well-defined, singular goal. This begs certain important questions: What will happen to the competitors if they are not able to match this level of financial commitment? Will AncestryDNA commit to an "open access" model that enables the citizen scientist to share in the new discoveries made, following in the footsteps of Family Tree DNA and 23andMe? Will the increased competition in this sector lead to leaps forward for all? Time will tell, but I can guarantee that the coming months will be very exciting ones in the world of genetic genealogy.

[10/26/12: This test is now out of Beta, so you can order it here.]
[5/19/13: I have noticed that many people who are looking for information about the AncestrybyDNA test are coming here. This is not the same company or test. For more on this, please read this post.]

Disclosure: I received both of the tests discussed above complimentary from AncestryDNA during their early beta testing phase.

Saturday, June 9, 2012

Today at Jamboree - ISOGG Meeting and 23andMe Kit Give Way

Please come join us at the International Society Of Genetic Genealogy meeting at 5pm. Alice Fairhurst will moderate the panel which will consist of Bennett Greenspan from Family Tree DNA, Mike Macpherson from 23andMe, Katherine Borges from ISOGG and myself. We will be taking questions and discussing updates in the wonderful world of genetic genealogy.

23andMe will be giving away one free Personal Genome Service (DNA kit) every hour on the hour from noon to 4:00 pm. To enter stop by the booth and either drop off a business card or fill out your name, email address and phone number (pens and paper handy).  You do not have to be present at the time your name is called to win.

Friday, June 8, 2012

23andMe Announces Beta Testing of New Ancestry Features at Jamboree


I know that many of you have been waiting for this, so I am happy to be able to report that 23andMe is announcing several exciting new Ancestry-related features at the Southern California Genealogy Jamboree today in Burbank. All four of these new features will be in beta testing by customers this summer.

The screen shots below are approximations and may not be representative of the appearance of the final product since they are still in development. The names of the features also may change based on the feedback provided during beta, but these summaries should give you an idea of what to expect from the enhanced features for customers interested in their ancestry.

Ancestry Painting Update 

Currently 23andMe's Ancestry Painting displays three major world regions (European, African and Asian). This feature is being updated to offer more detailed results based on approximately 20 world regions, drawn from both customer data and academic reference populations. I don't know a lot of details yet regarding the specific populations that will be identified, but there will be approximately six European sub-regions and the Native American is South and Central American. 

The image below shows the real data of Dr. Henry Louis Gates, Jr. host of "Finding Your Roots".  Although this image does not necessarily reflect the detail that Ancestry Painting v2 will eventually offer, it does refine his European DNA to Northern and Southern European ancestry and, importantly, reveals that the portion of his DNA formerly described as Asian is actually Native American. The image below is simply a mock-up and not what the final version will look like.

Population Percentages from Dr. Henry Gates' Ancestry Paintings v1 versus v2

I know that many of you will be thrilled to hear that the new Ancestry Painting will include X-Chromosome Painting. I have seen some of the raw data behind this feature and it is pretty mind-blowing. I want to emphasize that the new Ancestry Painting will not be static and will be periodically updated and expanded as the database grows.

My Ancestry Page

This new feature will be a huge improvement for genealogists and, especially, those new to DNA testing. It will present the highlights from each individual's ancestry summarized on one page, encouraging engagement and increasing the ease with which customers can access the ancestry related labs and tools. Information presented on this page will include things like Ancestry Painting details, match totals, haplogroups, top Ancestry Finder countries and links to educational resources.

"My Ancestry Page" - click to enlarge to see details

This screen shot is only an example, so I wanted to point out to those of you who are detailed oriented that some of the numbers are not representative of a real account (i.e. - Tiffany shows over 4,000 matches).

I expect that having all of this information available at a glance will help to increase participation in Ancestry-related features at 23andMe by capturing customers' attention and drawing them in to further explore their ancestry.

Relative Finder Explorer Map 

This feature illustrates from where your 23andMe Relative Finder matches originate. Details that are added to profiles will be automatically included to show where your matches are living now and also known locations of ancestors. You can filter by the predicted cousin level and have the option to cluster matches or to see them individually. It summarizes your top locations and allows you to zero in on specific regions.

My map currently shows 295 people out of 884 matches, so that seems to be roughly representative of how many people have added locations details to their profiles. I have had a chance to play around with this feature and it is highly interactive. It is based on Google Maps, so you can drag it around and zoom in and out. I am including several screen shots below to demonstrate the functionality. (Click to enlarge any of them.)

Example of Relative Explorer Map with Clustering On

My Relative Explorer Map with Clustering Off

My Relative Finder Map with Clustering On

Zooming in on my RF Map

I can already see some interesting clusters on my map that are worthy of investigation. It is very helpful to see all of this information summarized in one place because it encourages me to focus on one specific region at a time and delve deeper where indicated.

Family Tree 

This long-awaited and often-requested feature is finally being introduced. You can manually add details for individuals that include places, life events, traits, skills and fun facts, but don't worry, this feature will be GEDCOM upload enabled, so you genealogists will not have to retype your entire family tree. I expect that genealogists will have loads of feedback during beta for this feature.

Example of the 23andMe's New Family Tree

I think these new and improved features demonstrate that 23andMe is committed to their genealogy customers and are willing to invest the resources necessary to compete in this fast-growing industry. They have listened to the community's requests and are working toward a very competitive genetic genealogy product.

Individuals interested in being beta testers for these new features can sign up at the 23andMe booth (#706/707) at Jamboree starting today. (I'm confident that there will be other opportunities to participate for those of you not attending.)  For more information on the new features stop by the booth and/or attend the ISOGG meeting Saturday at 5:00 pm where I am participating on the panel with Mike Macpherson from 23andMe who can answer questions. I hope to see you there!