Showing posts with label Y-DNA. Show all posts
Showing posts with label Y-DNA. Show all posts

Wednesday, February 19, 2014

BritainsDNA Chromo2 Y-SNP Results Spreadsheet

I received an email from Dr. Jim Wilson of BritainsDNA today which included a link to a spreadsheet with Chromo2 Y-SNP results. He has given me permission to publish it with his comments:

We have finally got round to releasing an anonymised dataset of ~2000 chromo2Y results. This is an excel sheet with ~14,200 SNP results for ~2000 random men using the chromo2 chip, so will be a goldmine for discovering further genealogical structure in European haplogroups. I think it will be of great interest to genetic genealogists and others who are interested in breaking down their haplogroups and subgroups. 

The link is:  
https://www.britainsdna.com/download/C2_2000_v2.zip
(Updated 2/24/14)

Thanks again, Jim!


Friday, July 5, 2013

A Second Cousin Adds to My Chromosome Map and Answers A Nagging Genealogical Question


I was so happy to receive results for a 23andMe kit that I had sent to my second cousin a few weeks ago. I haven't had much opportunity to work on my own research or add to my known cousin studies lately, so it was nice to get a result that not only put to rest a nagging doubt about my genealogy, but gave me a substantial amount of DNA to add to my chromosome map.

Willard and Blanche (Purdy) Moore

These new results were from a male second cousin of mine. Our common ancestors are our great grandparents Willard Moore and Blanche Purdy. Having him test killed two birds with one stone, so to speak. He is related to me on my father's direct paternal line, so he carries the Moore surname. Since my dad has so few Y-DNA matches and only one borderline 33/37 Moore surname match, I have been wanting to "walk up" my Moore ancestral line, testing as I go to make sure that my all of my dad's relatives match as they should.

Through DNA testing, I have already confirmed that my dad and his brother were full siblings, as expected. The nagging doubt sprung from the fact that when I compared the 23andMe results for my female Moore second cousin, we shared much less DNA than expected for that relationship (1.17% versus 3.125%). That held true for comparisons against all of my relatives except for my paternal aunt, so I was looking forward to these new results to confirm that our Moore grandfathers were really full siblings.

Jack and Fred Moore

When I chose who to test with this kit, I looked for someone who was not only related on this line, but carried the Moore surname, so I could confirm that he shared the same usual I2b1 Y-DNA haplogroup subclade as my father. My Moore Cousin #2 fit the bill perfectly.

Fred and Jack with their father Willard

Today I received his results and not only does he carry a Y-chromosome with the I2b1 haplogroup, but he shares 3.84% of my autosomal DNA over 9 segments. Much of this was shared on different segments than my female second cousin on this line (they are first cousins), so it will really enhance my chromosome map. In the chart below, you can see the first Moore 2nd cousin that I compared myself to in the dark blue and the latest one in the light green.




















I was surprised how different the DNA is that we share since they are (confirmed) first cousins. Notable is the huge 90 cM segment on Chromosome 2. I have a much smaller segment in common with Cousin #1 on that chromosome, so this new comparison will help to extend that segment to cover a significant amount of my paternal chromosome 2 in addition to adding a smaller segment toward the end of the chromosome.





This was a good reminder of how much autosomal DNA sharing can vary within the acceptable range for a relationship. This concept can also be demonstrated by comparing my sisters and myself against this male Moore second cousin. The amount of DNA that I share with him (in light blue) is approaching double what one of my sisters (in dark blue) shares with him.



















Another interesting aspect of this comparison is that I now have a 2nd cousin, a 3rd cousin and a 3rd cousin once removed to compare from this Moore line.




















Notice how quickly the amount of DNA shared drops from one level of relationship to another, especially the dramatic drop between 2nd and 3rd cousin. Of course, this is just an example and not necessarily indicative of the expected amount of sharing for these relationships.

Probably the best thing about testing this cousin is that I get to update my chromosome map!

My Chromosomes Mapped to My Ancestors (click to enlarge)






















This result really inspires me to send those other kits that I have sitting on my desk out to additional cousins! So, who's next?

Thursday, April 18, 2013

Family Tree DNA's DNA Day Sale Starts Now!

***UPDATE - THIS SALE HAS BEEN EXTENDED THRU THURSDAY, APRIL 25th***

Just in from Family Tree DNA. (These are really great prices!)

FAMILY TREE DNA ANNOUNCES SPECIAL DNA DAY REDUCED PRICING
••••••••

LOW PRICES ON THESE AND MANY MORE:
Full Mitochondrial Sequence: $189
Family Finder:  $169
Y-DNA + Full Sequence: $358


We are pleased to announce our 2013 DNA DAY Promotion. While the special pricing features all the major tests, we’re placing particular emphasis on the Full Mitochondrial Sequence and Family Finder. We’ll offer Y-DNA upgrades during a Father’s Day sale and will give you those details
at that time. By carefully choosing the sale options and limiting the length of the sale, we will be better able to focus our resources on processing the tests efficiently and avoiding delays in delivering results.

We are proud to announce we have successfully moved our mtDNA Full Sequencing line from Sanger DNA sequencing to what is called Next Generation Sequencing (NGS). This gives us much greater capacity to process tests, to reduce costs without sacrificing quality, and to ensure shorter turnaround times. We must run the entire sequence every time we process an mtDNA full sequence test, even for upgrades. However, in recognition of your prior investment- and National DNA Day – we’re offering our lowest price ever for the FMS and upgrades. Rather than the 8-10 weeks first generation sequencing required, we expect results to be completed within 5-6 weeks. This does depend on the number of orders received though. If their DNA is already at our lab, those who order first may expect even shorter turnaround times. For a limited time we will be selling the FMS for $189 and whether you’ve tested HVR1 or HVR1+2, you’ll be able to upgrade to the Full Sequence for just $129!

In addition, we are also lowering the Family Finder to $169 for this sale! Here is the list of all tests under the promotion:

Full MtDNA Sequence…. $189
Upgrades to FMS….$129
Y-DNA37 (new and add-on)…. $119
Y-DNA67 (new and add-on)…. $199
Y-DNA37 + Full MtDNA Sequence…. $308
Y-DNA12 + FF…. $218
Y-DNA37 + FF…. $288
Y-DNA67 + FF…. $368
Family Finder.... $169
Family Finder + Full MtDNA Sequence…. $358
SuperDNA….$388 (Y-67 + FMS)
Comprehensive DNA…. $557 (Y-67 + FMS + FF)


The sale will begin tonight, April 18th, at 6PM CDT and will conclude at 11:59PM CDT on Monday April 22nd. All orders must be placed and paid for by the end of the sale to receive the promotional price. There will be no need for a coupon - all prices will be automatically adjusted on the website.  Order here.

THANK YOU FOR YOUR CONTINUED SUPPORT
Bennett Greenspan
President
Family Tree DNA

All orders must be placed and paid for by 11:59PM on Monday April 22nd, to receive the promotional rate. As with all promotions, orders need to be placed by the end of the sale and payment must be made by end of this sale.

Thursday, September 20, 2012

Let's All Start Using Terminal SNP Labels Instead of Y Haplogroup Subclade Names, Okay?


Is it just me or have the subclade names for Y-DNA just gotten out of control? I work with DNA all day long and I can't even keep up with all of the changes, so I have decided to start using the Terminal SNP labels exclusively. May I gently suggest that you do so also?













I frequently receive emails from otherwise well-informed people asking what their Y-DNA haplogroup subclade means, and it isn't their fault they are confused. You see, if they try to Google it, they are often unable to find information. If they try to locate academic papers on it, they are usually unsuccessful. Why is this? Well, the subclade name that they are given by their testing company may not be the same name that another testing company uses, or even the same as it was when they were first assigned it...and, quite likely, it isn't the same as the one on the most up-to-date tree at the International Society of Genetic Genealogy.

I have to admit that when R1b1b2 was changed to R1b1a2, I just started saying "R1b...whatever" when referring to it. Isn't it easier to just remember the defining SNP name R-M269?

For example, if you are R-L21+, then according to Family Tree DNA's Haplotree, you are R1b1a2a1a1b4, the ISOGG 2011 Haplogroup Tree's name for it. At 23andMe, you are R1b1b2a1a2f in agreement with the 2010 ISOGG Haplogroup Tree. If you tested in 2008, you might still think you are R1b1b2a1b6.  On ISOGG's 2011 Haplogroup Tree , L21+ was R1b1a2a1a1b4, but on ISOGG's 2012 Haplotree, you are R1b1a2a1a1b3. Apparently, R1b1a2a1a1b4 is now referring to L238/S182! I mean, really, how can anyone keep track? (Ah, for the days of the simple little R tree.) I don't know how our ISOGG Haplogroup Tree Committee* does it anymore! Apparently, the academics are getting tired of it too and it's just going to get worse when results from Geno 2.0 start rolling in with LOTS of new SNPs and subclades being defined.

Take look at the history of R-M222, the "Ui Niall Subclade", on just the ISOGG SNP Tree:
2007 = R1b1c7
2008 = R1b1b2a1b6b
2009 = R1b1b2a1a2f2
2010 = R1b1b2a1a2f2
2011 = R1b1a2a1a1b4b
2012 = R1b1a2a1a1b3a1a1

Reportedly, Geno 2.0 will define at least three new subclades beneath M222, but I hear it may be more. Do you think those subclade names might get even longer?









The R Haplogroup Tree is definitely the worst, but the problem is starting to affect other haplogroups too. At FTDNA, my dad is I2b1. Same at 23andMe. Sounds simple, right? Not anymore! The subclade name was recently changed to I2a2a on the ISOGG 2012 tree. I am so confused! This was one subclade name that I felt very comfortable with. I think I will just learn to call it I-M223 from now on. (I'll just ignore that his brother recently tested Z2062+, which isn't even on any of the trees yet!)

Actually, there is some rhyme or reason to these discrepancies, so let me share it for those of you who have no idea what I am writing about. FTDNA last updated their Y haplogroup tree in 2010 and 23andMe in 2011.*  So, they are going by the subclade names that were recognized at those times. In contrast, the ISOGG Haplogroup Tree has been updated over 60 times JUST THIS YEAR! Every time a new SNP is discovered that is upstream of a known SNP (which is happening faster and faster all the time), it has to be inserted into the tree, thus changing the subclade naming pattern. This is why it is so much simpler to just learn the Terminal SNP label.

The ISOGG Haplogroup Tree is a tremendous resource that anyone who is doing Y-DNA research should be utilizing. It helps to keep things straight by giving the various names of the SNPs that are being used by different companies and labs. When two or more SNPs are identical, meaning that they are on the same place on the Y-haplogroup tree with the same mutation, ISOGG shows the names in a series punctuated by "/". For example, let's look at M173/P241/Page29. M173 comes from Peter Underhill's lab at Stanford; P241 comes from Michael Hammer's lab at the University of Arizona and Page29 comes from the Page, Whitehead Institute for Biomedical Research. They appear in academic publications with these names and ISOGG lets you know that they are identical SNPs. That way, if you are Googling or looking for academic papers about your SNP, you know to try those ones too.

Going back to my first example of R-L21+; Wikipedia states, "R1b1a2a1a1b4 (R-L21) is defined by the presence of the marker L21, also referred to as M529 and S145." The label L21 comes from Thomas Krahn's FTDNA lab in Houston, M529 comes from Peter Underhill's lab at Stanford and S145 comes from Jim Wilson's lab at University of Edinburgh. ISOGG shows the SNP as L21/M529/S145. The bottom line is that if you test L21, M529 or S145 at any company, your assignment is in an identical place on the Y-DNA tree, so the subclade name is not the significant factor, the SNP name is.

Of course, with so many new SNPs being discovered and assigned to the tree, there will likely be a certain amount of continuing confusion among those of us doing Y-DNA research for the time being, but I hope you will all consider joining me in taking the next step forward in the evolution of Y-DNA research in genetic genealogy and stop trying to remember those mind-bending sublcade names! And, while we're at it, let's give our ISOGG Y-Haplotree Committee a well-deserved virtual pat on the back too!



*History:
The Y Chromosome Consortium (YCC) is a cooperative association of geneticists, led by Dr. Michael Hammer, who first published the paper in 2002, "A Nomenclature System for the Tree of Human Y-Chromosomal Binary Haplogroups", introducing the modern haplogroup nomenclature of Y-DNA. The tree was subsequently revised in 2003 by Mark A. Jobling and Chris Tyler-Smith in another paper, "The human Y chromosome: an evolutionary marker comes of age". Next, Family Tree DNA created the 2005 Y-Chromosome Phylogenetic Tree, which was the first online tree and only available to their customers. Soon thereafter, ISOGG created the first public online tree in 2006.  Tatiana Karafet of Dr. Hammer's lab (and others) published a paper further refining the Y chromosome tree in 2008, "New binary polymorphisms reshape and increase resolution of the human Y chromosomal haplogroup tree". As a result, both FTDNA and ISOGG updated their trees at that time. Then in 2010, FTDNA came out with a YCC-sanctioned tree which was distributed at the FTDNA conference and, as a result, ISOGG promptly did a major update to stay in alignment with the YCC.  Since then, no updates have come from the YCC. Undaunted, the ISOGG Y Haplogroup Tree Committee has continued to add information as it becomes available from various sources and is now the most up-to-date source of this information.  In November 2011 at the FTDNA Project Administrator's Conference, Spencer Wells of National Geographic, Michael Hammer of the University of Arizona, Thomas Krahn and Bennett Greenspan of FTDNA and Alice Fairhurst of ISOGG, reportedly agreed to all stay in alignment with the most current Y-DNA nomenclature to the best of their abilities. As always, there is new research that has not yet become public. As it is released, ISOGG will again align its tree with the most current information and will continue to add updates as they become available. With the upcoming launch of Geno 2.0, the ISOGG Committee will have their work cut out for them! Current ISOGG members who work with the tree and deserve our great appreciation are: Coordinator: Alice Fairhurst. Design team: Tanmoy Bhattacharya, Tom Hutchison, Richard Kenyon, Doug McDonald. Content experts: Abdulaziz Ali, Whit Athey, Ray H. Banks, Katherine Hope Borges, Aaron R. Brown, Phil Goff, Gareth Henson, Tim Janzen, Bob May, Eugene Matyushonok, Lawrence Mayka, Charles Moore, Ana Oquendo Pabon, Marja Pirttivaara, David Reynolds, Bonnie Schrack, Vince Tilroe, Aaron Salles Torres, Steve Trangsrud, Ann Turner and David Wilson.

Wednesday, July 25, 2012

National Geographic and Family Tree DNA Announce Geno 2.0


Today we have some news that is incredibly exciting for our citizen scientists and for all those who are interested in determining their ancestral origins through DNA testing. 

National Geographic is entering the next phase of their Genographic Project in partnership with  Family Tree DNA and the genetic genealogy community. Continuing to move toward their goal of mapping the pattern of human genetics, they are introducing the new GenoChip 2.0. This chip is specifically designed for ancestry testing and includes SNPs from autosomal DNA, X-DNA, Y-DNA and mtDNA. The design of the new chip was a collaborative effort between Eran Elhaik of Johns Hopkins, Spencer Wells of National Geographic, Family Tree DNA and Illumina. The testing will be done at FTDNA in Houston.

Dr. Wells explained that "off-the-shelf chips are not good for studying ancestry" for the simple reason that they are skewed in favor of medically relevant SNPs and are not focused on detailed inclusion of the sex chromosomes and the mtDNA. As a result, this team started from scratch choosing SNPs for the Illumina iSelect HD chip platform one and a half years ago. The resulting chip includes approximately 146,000 SNPs, avoiding all known medically relevant markers and exclusively concentrating on ancestry informative ones. This new chip will be used for both the research and the public participation component of the project.

The new funding structure for the project will be announced in September.

[Caution ahead: Some of the following is quite advanced, so if you are new to genetic genealogy, please skip over the unfamiliar portions. I am including as much as I can from my notes for the more advanced in our community who may want specific details.]

BASICS
The Geno 2.0 test will be offered for $199.95 with free shipping within the US on the National Geographic site and will only require a cheek swab. All resulting data will be downloadable. They will begin accepting pre-orders today for a fall shipping date (10/30/12). In the future, orders will  also be accepted through the Family Tree DNA website (no date is set for this option). Although this is not a traditional relative finder matching tool and is not meant to replace Family Finder, it will cluster you to your closest genetic matches and you will be able to send an anonymous email to correspond with them (not functional at launch). These circle clusters will demonstrate how you connect to people one thousand years ago.


Y-DNA SNPs
The chip includes just over 12,000 Y-DNA SNPs. Ten thousand of these are completely unique and have “never been published before”.  First, the team created probes for all of the 862 Y-SNPs from the current YCC 2010 Tree. Next, they contacted research centers all over the world and asked them to provide a list of all the Y-SNPs that they had data mined or discovered, including the L SNPs, the Z SNPs and “private Hammer” SNPs, and created probes for those. Y-SNPs discovered by citizen scientists were also included.

More details:
- Many new terminal branches will be gained and, according to Bennett Greenspan, this will completely replace the deep clade test currently offered by Family Tree DNA.
- Y-SNPs were vetted against Family Tree DNA’s “Walk Through the Y” samples.
- 862 SNPS from YCC 2010 Tree vs. 6,153 SNPs on the New Tree
- About 200 SNPs from 2010 failed with ~160 SNPS from 2010 unconfirmed
- Most failures were at roots such as R, P, A2 and F. Many have synonymous SNPs.
- 115 SNPs from YCC 2010 Current R-Tree vs 550 SNPs on the New R-Tree with ~200 more potential
- 31 SNPs from 2010 failed with 25 more unconfirmed, but in progress
- Rebekah Canada wrote and/or performed comprehensive rewrites of 182 different Y-DNA stories based on  approximately 1000 peer reviewed publications and information from the genetic genealogy community.
- New, updated Y haplogroup maps



mtDNA SNPs
The chip also includes over 3200 unique mtDNA SNPs. They started by creating probes for the 3352 highest frequency mtDNA SNPs from Family Tree DNA and GenBank. According to Elliott Greenspan, the level of difficulty was greatly increased due to variability in mtDNA. It was necessary to create about 31,000 probes to cover all of the variation that can be found in the surrounding flanking regions. Ultimately, they were able to detect about 3200 of those and, as a result, they can determine about 90% of the known haplogroups at this point.  

More details:
- All SNPs were vetted by running known samples.
- Rebekah Canada wrote new and/or performed comprehensive rewrites of 248 different mtDNA stories based on ~1000 peer reviewed publications and information from the genetic genealogy community. 
- New, updated mtDNA haplogroup maps




Autosomal and X-chromosomal SNPs
 
Over 130,000 autosomal SNPs and X-DNA SNPs were chosen
- AIMs harvested from literature
- AIMs identified using two methods
- Contributed by Family Tree DNA
- Identified at Random
 
Ancestry Informative Markers (AIMs) are SNPs that show substantial differences in allele frequency across population groups. Approximately 75,000 AIMs were chosen from approximately 450 populations around the world. About half of these AIMs were collected from about two dozen published papers and the rest were calculated from private and public datasets. Many of these populations datasets had not previously been studied for this purpose, so they used two algorithms to develop new and never before used AIMs: infocalc by Rosenberg and a private algorithm developed by Dr. Elhaik called “AIMsFinder” (PCA approach). Dr. Elhaik personally collected over 300 population datasets from which they had genotype data from thirty thousand to over one million base pairs and did very exhaustive pairwise comparisons between difficult-to-distinguish populations to build a unique database of AIMs. 

They also wanted to address the question of how much interbreeding occurred between modern humans and ancient hominins. Once again, they collected all relevant SNPs from existing literature on the subject and included those on the chip. However, they wanted to go further so they used a novel approach. They identified regions in which modern humans and Neanderthal shared the derived allele where Denisovan and Chimp share the ancestral and then repeated the exercise for derived alleles in Denisovan, but not Neanderthal and Chimp. Ultimately, they collected about 30,000 such SNPs that they feel can help identify interbreeding between ancient hominins and modern humans.



The team also included SNPs from underrepresented populations such as Paleo-Eskimos and Aboriginal Australians. What they call “control SNPs” came from 7,500 random SNPs that have high frequency in the HapMap and 1000 Genome Project. They were included to facilitate future studies on these SNPs and how they distribute in different populations. They excluded a large number of SNPs that had high linkage disequilibrium (LD) in all populations, excluding those found in the Hunter Gatherer and Papuan populations because these are of special interest for future studies. (An interesting side note, when these high LD SNPs are removed from the commercial platform chips, only about half of the total remains.) The team only included SNPs that were confirmed by both HapMaps and 1000 Genomes to reduce the number of erroneous SNPs.

To ensure that the genetic results will not be used for unethical purposes such as political ends, pharmaceutical ventures, etc… all samples are anonymous, no medical or trait data is collected, and all SNPs are non-coding and have no known function. In order to facilitate this process, the team built a huge database that included all SNPs that were known, suspected or implied to have associations with disease or traits. To avoid imputation, they also removed high LD SNPs. They are confident that phenotype cannot be inferred.

More details:
- 23,962 Neanderthal SNPs
- 1,357 Denisovan SNPs
- 12,027 Aboriginal SNPs
- 10,159 Eskimo Saqqaq SNPs
- 998 Chimpanzee SNPs
- 975 X chromosome SNPs (the team is looking for more X chromosome AIMs from citizen scientists)
- 76% of SNPs overlap with Illumina 660k array
- 55% of SNPs overlap with Illumina HumanOmni1-Quad and Express and Affy 6.0
- 40% of SNP overlap with Affy 5.0 and Human Origins Chip
- GenoChip is enriched for Common Alleles
- Heat maps




Summary
All of this adds up to an unprecedented effort by National Geographic and Family Tree DNA to move genetic genealogy in an innovative new direction. This is a very exciting time for all of us citizen scientists since it appears that there is increasing opportunity to contribute to this advancing field and recognition for those who do.

This blog post is really just a start. There will be much more to report in the coming weeks, including a product review. So, be sure and check back!



[Update 12/13/12 - You can see my results here and others here and here.]

Monday, May 14, 2012

"Finding Your Roots with Henry Louis Gates, Jr." - DNA in the Ninth Episode


The ninth episode of Finding Your Roots with Henry Louis Gates, Jr. featuring comedian Wanda Sykes, musician John Legend and 98-year old Margarett Cooper aired last night. Unfortunately, this was the shortest DNA segment of any episode so far (starting at 48:30), clocking just under two minutes. I have to admit that even without much genetic genealogy, I really enjoyed the thorough research tracing all three of these African Americans' family trees to their free ancestors of color. That was some outstanding genealogy work!

Although the genetic genealogy that was discussed in the episode was squeezed into a very small segment, I thought the explanations offered were very clear, so I will quote some of Dr. Gates' words here. In order to trace some of the guests' African ancestry back to its origin in Africa, the show used the company African Ancestry again. African Ancestry performs only Y-DNA and mitochondrial DNA (mtDNA) tests, so keep in mind that they are only examining the direct paternal and/or direct maternal ancestral lines.

The paths of Y-DNA (in black) and mtDNA (in red) in our family trees

Dr. Gates explains, "Fathers pass on exact copies of their Y-DNA to each of their sons and mothers pass on replicas of their mitochondrial DNA to all of their children." This means that the Y-DNA follows only the direct male line because fathers pass their Y-chromosome only to their sons. Conversely, mtDNA follows only the direct maternal line because only women pass their mitochondrial DNA on to their children. For example, if you are a female and have a brother, you share the same mtDNA that was inherited from your mother, but only you will pass it to your children, while your brother's children will receive their mother's mtDNA instead of his. (One caveat to Dr. Gates' statement is that both Y-DNA and mtDNA occasionally mutate, so they are not always "exact copies".)

Dr. Gates goes on to say, "So, if an African American shares either of these DNA segments with a member of a present day ethnic group in Africa, then it is likely that they share a common ancestor." It is on this basis that African Ancestry reaches their conclusions. (I commented on this idea further in an earlier episode.) In order to be able to do this, African Ancestry has compiled a database of samples collected from "the ethnic groups in West and Central Africa that were most heavily raided during the slave trade."

Map of the regions sampled for African Ancestry's DNA database

Each of Dr. Gates' guests were tested to determine to which tribe this small portion of their DNA most closely matches. Being male, John Legend was fortunate to be able to trace both his Y-DNA and his mtDNA. His mtDNA was most similar to the Mende people in Sierra Leone and his Y-DNA was most similar to the Fula people in Guinea-Bissau. As Dr. Gates states, this "suggests" that these specific branches of his family tree lead to these areas.

John Legend's mtDNA traces to the Mende People

John Legend's Y-DNA traces to the Fula tribe.

The women only had the opportunity to trace their mitochondrial DNA since they do not have Y-chromosomes. Wanda didn't seem to have received a very specific result. The show stated that her mtDNA matched several groups, including the Tikar and Fulani (appears to be the same as the Fula) people of Cameroon. Incidentally, without seeing her family tree, this demonstrates to us that her white ancestor Elizabeth Banks who mothered Wanda's line of free ancestors of color was not her direct maternal ancestor (mother's mother's mother, etc....). If she had been, Wanda would have likely possessed European mtDNA.

Wanda Sykes's mtDNA traces to the Tikar and Fulani people

As a side note, I thought it was a lot of fun watching the lovely Margarett fulfill one of her stated life goals by learning more about her ancestors before she dies. Her welcome inclusion in the show supports the idea that watching non-celebrities unravel the secrets of their family trees can be just as compelling (sometimes more so) than the stream of celebrities being offered these opportunities.

Margarett right after learning the origins of her mtDNA

Margarett was extremely pleased to discover that her maternal roots traced back to the Temne people of Sierra Leone. (I strongly suggest following the links to read about all of these interesting African groups.)

Margarett's mtDNA traces to the Temne Tribe of Sierra Leone

Let's not forget that in focusing exclusively on the Y-DNA and mtDNA, large portions of the guests' genetic heritage was ignored. My wishlist for this episode would have, of course, included the popular pie-charts with the ancestral origin percentages (admixture) for each of the guests. It would have been interesting to see how much European DNA each of these guests possess, as well as if there is potential for Native American ancestors in their family trees. Usually Dr. Gates uses 23andMe's Ancestry Painting for this, although he sometimes also uses Family Tree DNA's Population Finder as well. At the top of the list would have been an autosomal DNA test performed on John Legend's delightful fourth cousin John Hale to determine if they share any detectable DNA segments inherited from their mutual third great grandfather, the legendary Peyton Polly and his unknown wife. A test such as 23andMe or Family Tree DNA's Family Finder have about a 50% chance of detecting shared DNA between fourth cousins. As I have said before, the use of genetic genealogy in exploring these types of questions is my favorite application of the science. Autosomal DNA testing works best for examining a theory that two people are related to each other in relatively recent times. A negative result does not disprove the connection at this level of relatedness, but a positive one can strongly support it. Unfortunately, while Dr. Gates' team likely performed many of the DNA tests on my list, there wasn't time to show all of the results.

Sadly, next week is the last episode of the series, so I hope it is chock full of genetic genealogy! I don't know where all of us will go to get our television genealogy fix since "Who Do You Think You Are?" is also ending next week (apparently for good). I sure hope that Dr. Gates hurries up and produces another one of his great family history series!

For the last episode, Dr. Gates and his team will be exploring the multiracial ancestry of Michelle Rodriguez, Adrian Grenier and Linda Chavez. See you then!



I have been writing a review of the DNA testing used in each episode:
Week 1- Episode 1 & Episode 2 - Harry Connick Jr. & Branford Marsalis; Cory A. Booker & John Lewis
Week 2- Episode 3 - Barbara Walters & Geoffrey Canada

Week 3- Episode 4 - Kevin Bacon & Kyra Sedgwick
Week 4- Episode 5 -  Rick Warren, Angela Buchdahl & Yasir Qadhi
Week 5- Episode 6 - Robert Downey, Jr. & Maggie Gyllenhaal 
Week 6 - Episode 7 - Condoleezza Rice, Samuel L. Jackson and Ruth Simmons 
Week 7 - Episode 8 - Martha Stewart, Margaret Cho and Dr. Sanjay Gupta

Monday, April 16, 2012

"Finding Your Roots with Henry Louis Gates Jr." - DNA in The Fifth Episode


The fifth episode of Finding Your Roots with Henry Louis Gates Jr., which featured religious leaders Rick Warren, Angela Buchdahl and Yasir Qadhi, aired last night on PBS. This means we are halfway through the miniseries! As we have progressed through the episodes it appears to me that Gates and his team are featuring DNA testing progressively less and less. This is a shame and I hope this trend will reverse itself in the second half of the season.

The DNA portion of the show last night comprised only about 5 minutes (starting at 44:55), but drew some interesting genetic comparisons between guests of different religious backgrounds, demonstrating again that DNA testing is "deconstructing the notion of race" (quote from Gates in an earlier episode). Qadhi's Y-DNA haplogroup or what Gates referred to as his "paternal haplogroup" which was inherited from his father's father's father, etc... is J2. This haplogroup is common among those with Indian ancestry like Qadhi, but also "reaches levels of about 20% in Ashkenazi Jews". Qadhi said that he didn't find this too surprising since "Muslims and Jews consider themselves cousins...(as) descendants of Abraham."

Yasir Qadhi reviewing his Y-DNA Haplogroup

Qadhi also had a surprising "matrilineal cousin" in a guest from an earlier episode -  Barbara Walters - with whom he shares his mtDNA haplogroup which was inherited from his/her mother's mother's mother, (etc...). Unfortunately this haplogroup was not revealed in either episode, so we can only guess what it may be. I would imagine that in each case, while sharing the major haplogroup, Qadhi of Indian ancestry may belong to a slightly different subclade than his Jewish cousins. (See the Y-Haplogroup J Project for details.)

Less surprising, Rabbi Buchdahl is also a distant cousin to Barbara Walters. Gates said that Buchdahl and Walters share a common ancestor within about the last 300 years. He mentioned "three segments" in common between them. This test was clearly an autosomal DNA test by 23andMe or Family Tree DNA's Family Finder. Normally three matching segments of DNA would imply a significantly closer relationship than distant cousins, but among the highly endogamous Ashkenazi Jewish population, this is not uncommon. I was pleased to see Gates team incorporating all three types of DNA used for genealogy in this segment.

Angela Buchdahl reviewing her father's Warnick Y-DNA (?)

In the voiceover starting at 46:15, Gates explains, "DNA Analysis can tell us...where our earliest ancestors originated thousands of years ago, to whom we might be related today and the percentages of our African, European and Asian ancestry over the past 500 years." I thought that this may have caught some viewers' interest, so I wanted to detail which tests he was referring to in that comment.

The first part of the quote - "where our earliest ancestors originated thousands of years ago" -  refers to deep ancestry that is revealed through our Y-DNA and mtDNA haplogroups, tracing the migration of mankind. You can learn your haplogroups from 23andMe's DNA test or by taking the Y-DNA STR test and/or the mtDNA test at FTDNA.

Next, "to whom we might be related today" refers to autosomal DNA tests that match you with cousins from your more recent ancestry by examining matching blocks of DNA between testers. These types of autosomal tests are currently only offered by 23andMe and FTDNA. 23andMe's "Relative Finder" feature is included in their single test and FTDNA's test for this purpose is called "Family Finder".

Lastly, "the percentages of our African, European and Asian ancestry over the past 500 years" refers to biogeographical ancestry analysis or admixture tools like 23andMe's Ancestry Painting or FTDNA's Population Finder. In this case, he was specifically referring to 23andMe's Ancestry Painting. This is the most basic of genetic ancestral origin tools only using the three populations mentioned. FTDNA's Population Finder breaks down genetic origins into more granular populations and 23andMe's Ancestry Finder feature does so as well, with a somewhat different approach.

Earlier episodes have shown the pie chart with the guests' ancestral origins or genetic admixture by percentages. In this episode, we caught a glimpse of Pastor Rick Warren's dark blue chart, signifying 100% European. While he and Gates laughed at this result, it isn't necessarily as boring as it might sound since, in this case, "European" encompasses ancestry from all over Europe including Russia, France, Germany, Finland, the British Isles, the Middle East, Scandinavia and many other countries.

It would have been interesting to see all of the guests' Ancestry Paintings in this episode because there would have been much commonality between them. As a person of Indian heritage, Qadhi would have likely possessed a large European component and a smaller Asian one and Buchdahl's chart, being half Jewish and half Korean, would have probably been equally split between Asian and European. (Upon further review of the episode, it appears that I was correct - see below.) The specific differences between the Asian and European ancestry among the guests would only be revealed with a more detailed breakdown, such as FTDNA's Population Finder or 23andMe's Ancestry Finder.

Angela Buchdahl looking at her "Ancestry Painting"

Next week we will be privy to details of the family trees (and hopefully DNA!) of movie stars Robert Downey Jr and Maggie Gyllenhaal. That sounds like fun! See you then...


I have been writing a review of the DNA testing used in each episode:
Week 1 - Episode 1 and Episode 2 - Harry Connick Jr. & Branford Marsalis; Cory A. Booker & John Lewis
Week 2 - Episode 3 - Barbara Walters & Geoffrey Canada

Week 3 - Episode 4 - Kevin Bacon & Kyra Sedgwick

Monday, November 7, 2011

Family Tree DNA's 7th International Conference on Genetic Genealogy- Day One

I just arrived home from a whirlwind weekend in Houston attending FTDNA's 7th International Conference on Genetic Genealogy. I saw lots of friends, had many interesting conversations and thoroughly enjoyed all of the presentations. There's nothing like a full weekend completely immersed in my favorite subject - DNA!

I will attempt to summarize here the news and interesting tidbits from the conference using my notes. (I was writing fast, so please comment if something doesn't make sense or needs clarification.) I have added Tim Janzen's comments below. [TJ]

DAY ONE

Opening Remarks:
• FTDNA has the largest collection of full mitochondrial sequences in the world.
• FTDNA and Archives.com have entered into a partnership to integrate resources on the FTDNA website to facilitate family tree uploads and research.
• FTDNA has tested 600,000 people to date.
• John Spottiswood told us that Archives.com, through a partnership with Family Search, will offer the 1940 census images April 2, 2012. They will have the index done by the end of 2012, shooting for October. 6.5 million dollars have been invested in new records. They have 400,000 active subscribers. Archives.com has 18 of the top 20 collections at Ancestry.  All conference attendees received a free 1 year+ membership to Archives. com. It is regularly $39.95 per year.

Spencer Wells and News from The Genographic Project:
• They are wrapping up Phase 1 of The Genographic Project by the end of next year and beginning Phase 2, which is leveraging the resources gathered from Phase 1.
• The Genographic Project has collected 75,000 samples from indigenous  peoples in more than 130 countries from ~1000 populations.
• The majority of Canadian native tribes refused to DNA test. As a result, the project still does not have adequate sample coverage of the indigenous peoples of North America. South America is better.
• The National Genographic Project sold 10,000 kits the first day and 100,000 kits in the first 8 months with over 415,000 kits sold worldwide to date.
• The Project has raised 3 million dollars for it's Legacy Fund and given away 1.5 million dollars of that so far, funding 52 grants currently with ten more being funded in the next couple of weeks.
• Two papers on Basque DNA are coming out this week. One is on full mtDNA sequences and one is on Y-DNA. "Tracks pre-Roman tribal culture".
• We are losing a language in the world every two weeks!
• Until recently there was no genetic evidence of Asian impact on Hungarian DNA. The Project is now seeing 2%-3% Asian haplogroups in both mtDNA and Y-DNA in the 2334 Hungarian samples from the public. (Originally only sampled 100 indigenous Hungarians.) [Tim Janzen notes that it was, more specifically, 3% of the Y-DNA and 2% of the mtDNA.]
• DNA evidence is showing that the Indian caste system is older than Indo-European influence. (paper)
• They are doing things with ancient DNA that 10 years ago was impossible, that they "wouldn't have dreamed of doing".
• A paper is coming next year on ancient DNA research "transecting time", including information on farmers replacing hunter gatherers in Central Germany and mtDNA Haplogroup U5, which Spencer called "the hunter-gatherer haplogroup". They found different frequencies of haplogroups from samples at different layers. He says that the debate about the age of R1b has not yet been resolved. Spencer commented that outstanding issues about STR mutation rates is "not helping" and that we still "have to figure out Y-STR mutation rates."
• There is a correlation between linguistic dates and genealogical dates. Dr. Wells feels that he evolutionary rate overestimates these. He stated that the genealogical rates are more accurate over relatively short time spans and the evolutionary rate works better for deeper time spans.
• 1 in 17 men now living in the Mediterranean descend from Phoenician traders.
• 2000 Caucasus language speakers were sampled, finding a "remarkable concordance between genetic contrasts and language groups."
• The Project is starting to look at autosomal DNA. They are first looking at the X-Chromosome as a new genetic marker. They are using the pattern of recombination to infer history called "Theory of Junctions." (paper)
• Over 100,000 of the Genographic participants have converted their results to FTDNA.
• In Central Asia Y-DNA Haplogroup R1a has a high frequency - 40%-60% in some areas. R1a's frequency is higher in the mountains than in the plains on the same longitude. There is a ring pattern where R1a virtually disappears in the middle. According to Dr. Wells, this "central hole" was probably created by a replacement of R1a by East Asian haplogroups entering through the Dzhungarian gap.
• East Asian expansion corresponds to rivers as borders instead of the mountains.
• East Indians have the most Eurasian diversity.
• The project would like to look at the Australian "song lines" to determine if there is an overlap between where the songs intersect with other tribes and the genes.
• 10 papers are going to journals in the next two weeks with about a dozen more in the pipeline.
• The goal is to get the genetic information collected out to us "citizen scientists" for public participation. Spencer was not able to give a timeline as to when the database will be available.
• A big announcement is due next year from The Genographic Project.

Spencer Wells'  slide, photo courtesy Katherine Borges

Bruce Walsh covered DNA basics, including useful autosomal DNA information:
• Each human cell has 46 chromosomes and multiple copies of mtDNA
• 1 cM =  centi Morgan or a 1% chance of recombination, roughly corresponds to one million DNA base pairs.
• 1st-3rd generations can be estimated simply by the shared percent of DNA. Distant relatives are estimated by the largest block of shared DNA. There is a wide variation expected for the more distant relatives. Odds are that for relatives greater than 5 generations apart (10 total), all shared blocks have been lost.

• For Family Finder:
TMRCA                         Average Size of Blocks
1                                       44.06 cM
2                                       19.15 cM
3                                       12.3 cM
4                                        9.07 cM
5                                        7.19 cM
6                                        5.95 cM
7                                        5.08 cM

• Some autosomal DNA is dominant and will be passed down for a greater number of generations than expected.
• Looking for a common ancestor at 5 cM - 7cM shared blocks is "deep sea fishing". It is not strong evidence. At 7 cM there is about a 50/50 chance that the segment is identical by descent and there is a shared common ancestor in genealogical times. At 10 cM it is safe to assume that there is a common ancestor in genealogical times.
• The male X chromosome is phased since there is only one allele.
• Bruce Walsh discussed phasing and various other topics. FTDNA is exploring the option of phasing data where two parents and at least one child have done the FTDNA's Family Finder test or the 23andMe v3 test and using the phased data to run comparisons against other people in the FF database. The use of phased data in Family Finder would significantly reduce the number of matches that are simply identical by state. [TJ] 

Phasing and Analysis of Family Finder Data from David Pike:
• Phasing is separating the alleles to distinguish which are inherited from each parent.
• David gave us a hands-on demonstration on using a number of his tools for analyzing Family Finder raw data, available here. These include:
  • Search for Runs of Homozygosity (ROHs)
  • Search for Heterozygous Sequences
  • Search for Shared DNA Strands in Two Raw Data Files
  • Inspect a Shared DNA Strand in Two Raw Data Files
  • Inspect Shared DNA Strands in a Trio of Raw Data Files
  • Search for Discordant SNPs in Parent-Child Raw Data Files
  • Search for Discordant SNPs when given data for child and both parents
  • Search for Differently Reported SNPs
  • Phase a Child when given data for child and both parents
  • Phase Siblings with Data from One Parent
  • Phase Siblings with Data from Both Parents
• Parents' DNA can be phased if you have enough data from their children. David is working on reconstructing his grandparents DNA!
• A deceased parent's genomic data can be reconstructed from testing the other parent and the children.
• Cousins' data can also be utilized to help phase portions of relatives' genome.
• In particular, I enjoyed his discussion of microdeletions of autosomal DNA segments, which
can generally found by checking for discordant data. [TJ]
Very enthusiastic presentation!

Thomas Krahn and News from Walk Through The Y:
• 366 participants, 125.8 million basepairs sequenced, 180,000 bp average coverage per participant, 450 undocumented new Y-SNPs have been found, 167 participants did not find a new SNP.
• 90% of participants have chosen their results be public on the Finch2 platform.
• Very advanced customers have mined the data from the 1000 Genomes Project and extracted new SNP candidates (Z series), suggesting promising markers and helping to design the primers.
• Covered new features on the draft tree.
• Currently 350k-400k base pairs for Walk Through The Y, 20 times more data via the new Roche 454 - ran Saturday night. 
• Thomas Krahn summarized the latest Roche 454 sequencer Y chromosome sequencing results. He is doing Y chromosome enrichment of the DNA prior to sequencing so that he can maximize the Y chromosome sequence data from each sequencing run. In his latest run he tested 8 samples, but only 2 came out reasonably well. He plans to reduce the number of beads he uses in the sequencer and he hopes that will improve the quality of his data. In the latest experiment he got about 19,000 reads from one sample, of which about 48% of the reads were from the Y chromosome after Y enrichment. The average read length was in the 400-600 base pair range. Thomas plans to put the latest sequencing results on his FTP server as a downloadable file of about 300 million megabytes of data for Y SNP hunters to review. Thomas plans to continue to work on Y sequencing until he can perfect the sequencing. Thomas said that there are about 20 million base pairs on the Y that are worthwhile sequencing. The first 2 million base pairs on the p arm are pseudoautosomal and thus aren't helpful from a Y SNP search prospective. The palindromic regions also generally don't have many
Y SNPs. The new 454 sequencer will allow about 20 times as many bases to be sequenced as can be done with the WTY project currently. Now the WTY results generally include about 400,000 base pairs. Thus Thomas anticipates at least 6-8 million base pairs of the Y chromosome can be sequenced with the new 454 sequencer in the short term and hopefully about 20 million base pairs can be sequenced in the long term. [TJ]


Thanks to Dr. Krahn, the full presentation is available here. The download is near the bottom and called "Walk Through The Y - Update 2011". 

Peter Hrechdakian and Findings from the Armenian DNA Project:
• Tested over 600 Armenians since 2009 in the Armenian DNA Project.
•Armenians are very diverse with 14 major Y-DNA haplogroups, 80 distinct Y-DNA subclades and 13  major mtDNA Haplogroups with 67 subclades.
• Armenians and Assyrians have very similar YDNA and mtDNA distribution patterns.
• There have been 13 Armenian Walk Through The Y participants so far. New SNPs have been found in 10 of them.
• Please visit the Houshamadyan Project website. This project is attempting to reconstruct Ottoman Armenian town and village life.

Peter Hrechdakian's slide, Photo courtesy Katherine Borges
• Peter Hrechdakian gave an overview of the very important Armenian project, which now has over 600 people in it. Armenia and the nearby regions in the Caucasus are the cradle of many Y haplogroups and thus additional testing of people from this region is very important. Peter's presentation included many excellent diagrams and graphs summarizing the mtDNA, Y, and autosomal data from the Armenian project. [TJ]

Stephen Morse spoke explained how to use his One Step Webpages.
Dr. Stephen Morse discussed many of the "one step" tools that he has available on his web site at www.stevemorse.org. It had been some years since I had last visited his web site and I was pleased to learn that he has added some DNA tools to his web site, including a genetic distance calculator. Some other web sites he mentioned that can be helpful for finding living people include www.PrivateEye.com and www.ZabaSearch.com. [TJ]

Question and Answer Panel:
• Julie Hill walked us through the Archives.com site.
• Instead of importing Gedcoms to FTDNA, users will be able to link through to their live family tree on Archives.com.
• In November/December 2008, FTDNA started advertising on Facebook, offering 12 markers for $59. They sold ZERO tests from these ads.
• In October 2010, FTDNA started a Facebook page. They have had their biggest ever promotions there and have ~16,000 "likes". Facebook users are now regularly demanding promotions.
• Bennett - "23andMe raw data uploads will be coming in the next 4-6 weeks for about $50." (v3 only)
• Sometime next year all Genographic Project samples will be destroyed in line with their original terms if not transferred to FTDNA. It will take about a year to destroy them all. The Genographic Project has made the FTDNA logo larger and larger on the bottom of their page, but they cannot directly contact participants due to their anonymous collection method.
• FTDNA is considering extending sample storage form 25 to 50 years.
• FTDNA will have some presentations from the conference available for download.
• FTDNA has tested samples for Family Finder that were up to 8 years old. Some have worked, some have not. The Illumina chip is much more robust than the Affy chip was with a 99% success rate the first time a sample is run. Bennett "would be more on the liberal side about trying old samples than 6-8 months ago".
• FTDNA will allow uploads of Ancestry.com autosomal data, but will not provide customer support for it.
• Recommended reading: "The Great Human Diaspora" by Cavalli-Sforza.
• Archives.com does not yet have searchable family trees or the New York passenger lists. They are working on adding these.

You can find Day Two here.

[Disclosure - my company StudioINTV has an existing production agreement with FTDNA that has no bearing on the opinions I express. I also receive a small commission from FTDNA on non-sale orders through my affiliate link, which I use to fund DNA tests. I receive no other compensation in relation to any of the companies or products referenced in my blog.]