Today we have some news that is incredibly exciting
for our citizen scientists and for all those who are interested in determining
their ancestral origins through DNA testing.
National Geographic is entering the next phase of their
Genographic Project in partnership with Family Tree DNA and the genetic genealogy community.
Continuing to move toward their goal of mapping the pattern of human genetics,
they are introducing the new GenoChip 2.0. This chip is specifically designed
for ancestry testing and includes SNPs from autosomal DNA, X-DNA, Y-DNA
and mtDNA. The design of the new chip was a collaborative effort between Eran Elhaik of
Johns Hopkins, Spencer Wells of National Geographic, Family Tree DNA and
Illumina. The testing will be done at FTDNA in Houston.
Dr. Wells explained that "off-the-shelf chips are not
good for studying ancestry" for the simple reason that they are skewed in
favor of medically relevant SNPs and are not focused on detailed inclusion of
the sex chromosomes and the mtDNA. As a result, this team started from scratch
choosing SNPs for the Illumina iSelect HD chip platform one and a half years
ago. The resulting chip includes approximately 146,000 SNPs, avoiding all known
medically relevant markers and exclusively concentrating on ancestry
informative ones. This new chip will be used for both the research and the
public participation component of the project.
The new funding structure for the project will be announced in September.
The new funding structure for the project will be announced in September.
[Caution ahead: Some of the following is quite advanced, so if you are new
to genetic genealogy, please skip over the unfamiliar portions. I am including
as much as I can from my notes for the more advanced in our community who may
want specific details.]
BASICS
The Geno 2.0 test will be offered for $199.95 with free
shipping within the US on the National Geographic site and will only
require a cheek swab. All resulting data will be downloadable. They will begin accepting pre-orders today for a fall shipping date (10/30/12). In the future, orders will also be accepted through the Family Tree DNA website (no date is set for this option). Although this is not a traditional relative finder matching tool and is not meant to replace Family Finder, it will cluster you to your closest genetic matches and you will be able to send an anonymous email to correspond with them (not functional at launch). These circle clusters will demonstrate how you connect to people one thousand years ago.
Y-DNA SNPs
The chip includes just over 12,000 Y-DNA SNPs. Ten thousand
of these are completely unique and have “never been published before”. First, the team created probes for all of the 862
Y-SNPs from the current YCC 2010 Tree. Next, they contacted research centers
all over the world and asked them to provide a list of all the Y-SNPs that they
had data mined or discovered, including the L SNPs, the Z SNPs and “private
Hammer” SNPs, and created probes for those. Y-SNPs discovered by citizen
scientists were also included.
More details:
- Many new terminal branches will be gained and, according to
Bennett Greenspan, this will completely replace the deep clade test currently
offered by Family Tree DNA.
- Y-SNPs were vetted against Family Tree DNA’s “Walk Through the Y”
samples.
- 862 SNPS from YCC 2010 Tree vs. 6,153 SNPs on the New Tree
- About 200 SNPs from 2010 failed with ~160 SNPS from 2010
unconfirmed
- Most failures were at roots
such as R, P, A2 and F. Many have synonymous SNPs.
- 115 SNPs from YCC 2010 Current R-Tree vs 550 SNPs on the New
R-Tree with ~200 more potential
- 31 SNPs from 2010 failed with 25 more unconfirmed, but in
progress
- Rebekah Canada wrote and/or performed comprehensive rewrites
of 182 different Y-DNA stories based on approximately 1000 peer reviewed publications and information
from the genetic genealogy community.
- New, updated Y haplogroup maps
mtDNA SNPs
The chip also includes over 3200 unique mtDNA SNPs. They
started by creating probes for the 3352 highest frequency mtDNA SNPs from
Family Tree DNA and GenBank. According to Elliott Greenspan, the level of difficulty
was greatly increased due to variability in mtDNA. It was necessary to create about
31,000 probes to cover all of the variation that can be found in the
surrounding flanking regions. Ultimately, they were able to detect about 3200
of those and, as a result, they can determine about 90% of the known
haplogroups at this point.
More details:
- All SNPs were vetted by running known samples.
- Rebekah Canada wrote new and/or performed comprehensive
rewrites of 248 different mtDNA stories based on ~1000 peer reviewed
publications and information from the genetic genealogy community.
- New, updated mtDNA haplogroup maps
Autosomal and
X-chromosomal SNPs
Over 130,000
autosomal SNPs and X-DNA SNPs were chosen
- AIMs harvested from literature
- AIMs identified using two methods
- Contributed by Family Tree DNA
- Identified at Random
Ancestry Informative Markers (AIMs) are SNPs that show
substantial differences in allele frequency across population groups. Approximately
75,000 AIMs were chosen from approximately 450 populations around the world. About
half of these AIMs were collected from about two dozen published papers and the
rest were calculated from private and public datasets. Many of these populations
datasets had not previously been studied for this purpose, so they used two algorithms to develop
new and never before used AIMs: infocalc by Rosenberg and a private algorithm
developed by Dr. Elhaik called “AIMsFinder” (PCA approach). Dr. Elhaik personally
collected over 300 population datasets from which they had genotype data from thirty
thousand to over one million base pairs and did very exhaustive pairwise
comparisons between difficult-to-distinguish populations to build a unique
database of AIMs.
They also wanted to address the question of how much
interbreeding occurred between modern humans and ancient hominins. Once again,
they collected all relevant SNPs from existing literature on the subject and
included those on the chip. However, they wanted to go further so they used a
novel approach. They identified regions in which modern humans and Neanderthal shared
the derived allele where Denisovan and Chimp share the ancestral and then
repeated the exercise for derived alleles in Denisovan, but not Neanderthal and
Chimp. Ultimately, they collected about 30,000 such SNPs that they feel can
help identify interbreeding between ancient hominins and modern humans.
The team also included SNPs from underrepresented
populations such as Paleo-Eskimos and Aboriginal Australians. What they call “control
SNPs” came from 7,500 random SNPs that have high frequency in the HapMap
and 1000 Genome Project. They were included to facilitate future studies on
these SNPs and how they distribute in different populations. They excluded a
large number of SNPs that had high linkage disequilibrium (LD) in all
populations, excluding those found in the Hunter Gatherer and Papuan
populations because these are of special interest for future studies. (An
interesting side note, when these high LD SNPs are removed from the
commercial platform chips, only about half of the total remains.) The team only
included SNPs that were confirmed by both HapMaps and 1000 Genomes to reduce the
number of erroneous SNPs.
To ensure that the genetic results will not be used for
unethical purposes such as political ends, pharmaceutical ventures, etc… all
samples are anonymous, no medical or trait data is collected, and all SNPs are
non-coding and have no known function. In order to facilitate this process, the
team built a huge database that included all SNPs that were known, suspected or
implied to have associations with disease or traits. To avoid imputation, they
also removed high LD SNPs. They are confident that phenotype cannot be inferred.
More details:
- 23,962 Neanderthal SNPs
- 1,357 Denisovan SNPs
- 12,027 Aboriginal SNPs
- 10,159 Eskimo Saqqaq SNPs
- 998 Chimpanzee SNPs
- 975 X chromosome SNPs (the team is looking for more X chromosome
AIMs from citizen scientists)
- 76% of SNPs overlap with Illumina 660k array
- 55% of SNPs overlap with Illumina HumanOmni1-Quad and Express and Affy 6.0
- 40% of SNP overlap with Affy 5.0 and Human Origins Chip
- GenoChip is enriched for Common Alleles
- Heat maps
Summary
All of this adds up to an unprecedented effort by National
Geographic and Family Tree DNA to move genetic genealogy in an innovative new
direction. This is a very exciting time for all of us citizen scientists since
it appears that there is increasing opportunity to contribute to this advancing
field and recognition for those who do.
This blog post is really just a start. There will be much more to report in the coming weeks, including a product review. So, be sure and check back!
This blog post is really just a start. There will be much more to report in the coming weeks, including a product review. So, be sure and check back!