In November 2011 the genetic testing company 23andMe rolled out a pilot project to sequence the exomes of project participants as a complement to its current test that analyzes approximately 1,000,000 SNPs. The exome project is detailed at here. The cost for participating in this project is $999. 23andMe is using an Agilent exome capture system and an Illumina HiSeq 2000 machine to sequence the DNA in the exome at approximately 80 times coverage. The exome comprises about 1.5% of the entire human genome and includes approximately 50,000,000 base pairs. The exome includes all of the coding portions of the genes that are used to make proteins. 23andMe customers who enrolled in the project had to submit additional DNA as saliva samples. The earliest participants in the project received their special exome DNA collection kits in December 2011. The data for those people who have returned their kits is now starting to come back from the lab to 23andMe's headquarters. One of 23andMe's specialists in bioinformatics, Brian Naughton, is heading up the exome project. Brian has sent two e-mails to the project's participants updating them about the project, one in January and the most recent one on February 17. No customers have yet received their exome results, but it seems probable that within the next several months at least some project participants will be receiving their results. Brian estimates that the size of the data file including all of the variants (SNPs, insertions, and deletions) will be approximately 10-20 MB in size.
So how is exome testing going to be helpful for the typical genetic genealogist? It is still somewhat early to say for sure, but exome testing will without doubt reveal specific autosomal SNPs and other variants (insertions and deletions) that are unique to specific family lines. These SNPs in some cases will have occurred within the past one to two generations, but in other cases they will have occurred hundreds or even thousands of years ago. These autosomal SNPs will likely help us trace specific ancestral lines once we can link specific SNPs to specific ancestors. Once a SNP has been linked to a particular ancestor then essentially any person who carries that specific SNP will be established as a descendant of the first ancestor in which that mutation occurred (unless the mutation in question has occurred more than once in human history). Each autosomal SNP that is found in the exome should be cataloged and an attempt should be made to determine which ancestor the SNP was inherited from. It seems highly probable that autosomal SNPs will be used in conjunction with matching half identical regions (HIRs) to help map segments of chromosomes to specific ancestors. Autosomal SNPs could also be used to map very tiny segments of DNA (under one to two cMs in length) that are too small to be identified via the typical mapping procedure for larger DNA segments.
Exome testing is simply the first step toward complete genome sequencing. It seems likely that 23andMe at some point in the future will begin offering complete genome sequencing. Relatively few people in the genetic genealogy community have done complete genome sequencing up to this point in time. However, within the next 5 years it will be commonplace for genetic genealogists to do complete genome sequencing. Ideally, autosomal DNA data will be pooled and will be matched with pedigree charts so that DNA segments can be accurately linked to specific ancestors. It will be exciting to watch this unfold in the coming years and to see how exome testing and complete genome testing impacts genealogical research!
Tim Janzen is a family practice physician in Portland, Oregon and a long-time genetic genealogist. He serves as one of six 23andMe Ancestry Ambassadors as well as on the ISOGG Y-DNA Haplogroup Tree committee. He is a leading researcher in autosomal DNA for genealogy.