Your Genetic Genealogist: November 2012

Friday, November 30, 2012

More on Geno 2.0: Third Party Resources and Images from a Second Set of Results

It turns out that an additional correspondent of mine received her National Geographic Geno 2.0 results in the last batch and she has kindly given me permission to share them. I have also received word that several third party tools are ready to accept GenoChip (Geno 2.0) raw data.

THIRD PARTY TOOLS
Mitochondrial DNA researcher James Lick's mthap tool (mtDNA haplogroup predictor) is now capable of processing the new GenoChip raw data. In fact, his tool returns an even more detailed subclade for the K1a1b1 result (shown below) than Geno 2.0 currently reports (K1a1b1f). This was also true for a K1 sample that was extended to K1e with Jim's tool.

Happily, reversing his earlier statements, Dr. Doug McDonald has successfully adapted his admixture analysis program to be able to work with the Geno 2.0 files. However, as expected, he reported that the results are inferior when compared to using 23andMe and Family Finder files due to the lower number of SNPs that he is able to incorporate in his program.

Dienekes announced today that he has created a converter to run Geno 2.0 files on his DIYDodecad tool.

After working with a raw data file, Mike Cariaso, cofounder of SNPedia, tells me that versions of Promethease 0.1.149 and later will be able to read the Geno 2.0 files directly.

Leon Kull reports that he is accepting Geno 2.0 files for inclusion in his HIR Search autosomal DNA matching database.

Y-DNA researcher David Reynolds compiled a list of Y-SNPs on the GenoChip from the raw data file. He explains on the ISOGG Facebook page, "While we don't know the exact location yet of most of the 12,000+ (e.g., the CTS, F, PF SNPs), this will answer the questions about which DF/L/Z etc SNPs are included."

Dr. Tim Janzen, 23andMe Ancestry Ambassador and ISOGG Y-SNP Tree Committee Member, is currently working on creating a file that will include the SNP positions for all of the SNPs on the GenoChip that are also found on the 23andMe v3 chip. I will add the link here when it is completed (probably tomorrow).

Dr. Ann Turner (we all know who she is and needs no introduction from me!) has informed me that she is ready to share the spreadsheet method that she has been working on to facilitate Geno 2.0 raw data usage with third party applications. She tells me, "GenoConvertTemplate.xlsm contains a macro to convert the GenoChip raw data download to the format used by 23andMe, which many 3rd party utilities can handle. It uses Build 37 numbers for the chromosomal locations of the SNPs." She would like to test it with additional Geno 2.0 data files, especially from people who have also tested at 23andMe and are willing to share both files with her. Please contact Ann directly for access or to donate your raw data (you can write to me for her email address if you don't have it).

Mitochondrial DNA researcher and mtDNA Haplogroup K Project administrator Bill Hurst has spent quite a bit of time examining the data as well. After working with two files, he commented (in agreement with Jim Lick) that so far he "was impressed by the full coverage of the test, even while testing only 19% of the mtDNA."

I had full confidence that our amazing community of researchers/citizen scientists would come through for us and they have, even more quickly than imagined! Thank you to all those who have contributed so far to this groundbreaking project. (There's a long way to go yet!) I would be remiss if I didn't note that this expedient work could not have happened without the generosity of both Sharon and Anna in sharing their early data. Thanks go out to both of them from all of the community!

The researchers listed above are still looking to examine more raw data files, so if anyone reading this has received their results and would like to contribute to the cause, please contact me or any of the researchers directly. If you have a third party tool that is accepting Geno 2.0 raw data and is not listed, please comment below or email me to be added. I will continue to update this list as I receive more info, so please check back periodically.

**To access your Geno 2.0 raw data for use with these tools, go to "Expert Options" under the Profile tab and download the .CSV file to your desktop.**

MORE RESULTS FROM GENO 2.0:

YOUR MATERNAL LINE
Here are the newest set of mtDNA results:

WHO AM I?
These are her autosomal results:

Note: Clicking on a percentage brings up information about that population

National Geographic has recently clarified that the matching reference populations listed as first and second (shown in this screen shot) are not in order of closest matching, but simply the top two.

Interestingly, her Denisovan ancestry component is significantly lower than the other account that I reviewed.

OUR STORY
It looks like there are a few more "stories" in the community section now:

There are a total of 30 as of today (more than when I looked yesterday!).

TRANSFERS
I tried to transfer her data to Family Tree DNA (found in Expert Options under the "Download Data" button),

but I got this:

"The kit you entered already has a Population Finder, new PopFinder data cannot be transferred"

It appears that if you already have a Family Finder test from FTDNA (which includes the Population Finder feature), then you cannot transfer your Geno 2.0 results at this time. I'm surprised by this since Family Finder (PopFinder) and Geno 2.0 are very different products. Further, this customer does not have a mtDNA test in her FTDNA account, so she should be able to transfer her mtDNA results at the very least. One project administrator has already reported that a woman has transferred her results into an FTDNA Project, so contrary to earlier reports, women are not automatically blocked from transferring their Geno 2.0 results to FTDNA. I'm quite confident that National Geographic or FTDNA will clarify this issue for us soon.

MORE TO COME...
I'm sure this will be only one of my many future posts on Geno 2.0, so stay tuned.

Wednesday, November 21, 2012

Genographic Project 2.0 - First Look!

I am very fortunate to have been given the opportunity to get a glimpse into one of the first test results returned for the National Geographic's new Geno 2.0 (now called the GenoChip) and I am happy to be able to share some screen shots with my readers. I haven't had time to reach any conclusions yet and probably won't write a review until I receive my own results, which currently sit at 60% complete.

However, the screenshots can speak for themselves.

YOUR STORY

The results open with:

You can click through to the different sections for MUCH more information. These results are for a female, thus the question mark under paternal line.

YOUR MAP
There are heat maps for the haplogroups. For mtDNA Haplogroup U, they walk you through the migration from the root of Hg L3 to Hg N to Hg R and, finally, to Hg U.

WHO AM I?
NatGeo has discovered that nine ancestral regions make up each of our genomes. They are Northeast Asian, Mediterranean, Southern African, Southwest Asian, Oceanian, Southeast Asian, Northern European and Sub-Saharan African. According to NatGeo each of us is a blend of these nine ancestral regions, except Native Americans, Oceanians and the Khoisan people.

They compare our DNA to 43 reference populations, each made up of distinct blends of these nine regions:

Each of us will receive an estimate of which of these 9 regional affiliations and 43 populations we most closely resemble genetically. Here is an example:

There is also an estimate of hominid ancestry:

OUR STORY
The community aspect of the results is called "Our Story". You will be the center of your universe with those with the most similar genetics clustering around you:

So far there aren't enough participants sharing stories,

but that will soon change!

RAW DATA
In case you were wondering what the raw data looks like, here is the beginning of the file, which contains the mtDNA SNPs (with the ID removed from Column A and allele values removed from columns D and E):

and some of the Y-SNPs (no-calls here):

In the download of the raw data, the Y-SNPs are listed as no-calls for women, as above. This file can be accessed by going to "Expert Options" under Profile and downloaded as a CSV file. (Download example file here.)

That's all for now... Enjoy!

My deepest appreciation to Sharon Schmidt for her generosity and willingness to share with me and our entire community.

[*For additional screen shots, clarifications to some of the questions raised in the comments below and information on third party resources, please see this post.]

Thursday, November 15, 2012

GeneaBlog Awards 2012...Guess Who?

I am extremely proud to have been honored by Tamura Jones in the fifth annual GeneaBlog Awards, not once but TWICE! This year Tamura recognized genealogy blogs in five categories:

Best New Genealogy Blog: AQ Will Do by Dale McIntyre
Best In Depth Industry Coverage: Your Genetic Genealogist for AncestryDNA coverage
Most Pronounced Genealogy Activism: The Legal Genealogist by Judy Russell for SSDI coverage
Best Guest Series: DNA Testing for Genealogy by CeCe Moore on Geni.com
Best Uniquely Informative Series: The Legal Genealogist by Judy Russell for "Terms of Service"

I was especially happy to read that Tamura recognized my efforts to fairly and accurately report Ancestry.com's new venture into autosomal DNA testing with these words:

This year, Ancestry.com started offering autosomal DNA tests through their AncestryDNA service, at relatively affordable prices.
No one has covered this industry development more extensively than CeCe Moore on her Your Genetic Genealogist blog.

In fact, her coverage starts late last year, well before the Beta period.
She was the first to point out that AncestryDNA may be cheap, but does not provide you with your own data, an important issue many other authors missed completely.
She has consistently taken the position that Ancestry.com should release the actual test results to their customers.

Your Genetic Genealogist was the first to report that Ancestry.com had bought GeneTree.

CeCe Moore exclusively reported on erroneous results by AncestryDNA, provided follow-up explaining what went wrong,
and did not hesitate to report an AncestryDNA success story later.
Your Genetic Genealogist has regularly provided AncestryDNA information while misinformation abounded.
There is no better source of independent information on AncestryDNA.

Congratulations to my colleagues Judy and Dale for their much deserved awards. Thank you to Tamura for the honor and to my faithful readers because, without you, none of this would be possible.

[**Please see Judy's latest post on The Legal Genealogist for background on Tamura Jones and the GeneaBlog Awards.**]

Tuesday, November 13, 2012

Family Tree DNA's Annual Holiday Sale Begins a Little Early This Year!

I just returned home from the wonderfully inspiring Family Tree DNA Conference held last weekend in Houston and received an email announcing the start of their seven week holiday sale. So, if you have been wanting to order a DNA test for yourself or as a gift, please see below for the annual holiday sale prices.

As we ended our 8th Annual Genetic Genealogy Conference, several conference participants asked us to start our year-end sale as soon as possible. In answer to those requests we decided to start it immediately:

New Kits	Current Price	SALE PRICE
Y-DNA 37	~~$169~~	$119
Y-DNA 67	~~$268~~	$199
mtDNAPlus	~~$159~~	$139
mtFullSequence (FMS)	~~$299~~	$199
SuperDNA (Y-DNA 67 and mtFullSequence)	~~$548~~	$398
Family Finder	~~$289~~	$199
Family Finder + mtDNAPlus	~~$438~~	$318
Family Finder + mtFullSequence	~~$559~~	$398
Family Finder + Y-DNA 37	~~$438~~	$318
Comprehensive (FF + FMS + Y-67)	~~$837~~	$597
Upgrades	Current Price	SALE PRICE
Y-Refine 12-25 Marker	~~$59~~	$35
Y-Refine 12-37 Marker	~~$109~~	$69
Y-Refine 12-67 Marker	~~$199~~	$148
Y-Refine 25-37 Marker	~~$59~~	$35
Y-Refine 25-67 Marker	~~$159~~	$114
Y-Refine 37-67 Marker	~~$109~~	$79
Y-Refine 37-111 Marker	~~$220~~	$188
Y-Refine 67-111 Marker	~~$129~~	$109
mtHVR1toMega	~~$269~~	$179
mtHVR2toMega	~~$239~~	$179
mtFullSequence Add-on	~~$289~~	$199

ALL ORDERS MUST BE PLACED AND PAID FOR BY MONDAY, DECEMBER 31, 2012 11:59:00 PM CST TO RECEIVE THE SALE PRICES. Order here.

I know that many of you are waiting for my annual conference summary. I am working on getting that together as quickly as possible for you, but in the mean time, Roberta Estes has a very comprehensive summary on her blog DNAeXplained. and Jennifer Zinck has her review of Day 1 on her blog Ancestor Central. Enjoy!

Wednesday, November 7, 2012

GenoChip Kit Update: DNA Isolation and on to Analysis!

It is moving along...

I can hardly wait for the results!

Monday, November 5, 2012

Ken Chahine Answers My Questions and Reveals Behind-the-Scenes Information about AncestryDNA

I recently had a long and enlightening conversation with Dr. Ken Chahine, General Manager of AncestryDNA. Due to the lack of information from reliable sources regarding the details of what is going on behind the scenes at AncestryDNA, there has been considerable speculation among those interested in the product. As it turns out, much of this has been incorrect.

AncestryDNA's Matching Threshold
First, mostly due to the large number of matches, it has been widely speculated that AncestryDNA is allowing for a much lower threshold than either 23andMe or Family Tree DNA, reporting matches based on as little as two cM. In reality, Ken tells me that AncestryDNA has been using a 5 Mb cutoff [Mb = mega base pairs = 1,000,000 base pairs] for reporting matches in their lowest category - "very low confidence". He explains how they came to this decision and what AncestryDNA sees as the benefits to their customers:

AncestryDNA, we believe, is the only service that phases the genotyping data and has validated the matching algorithm with large pedigrees. That leads to two important differences. First, it allowed us to test various segment cutoffs from 5-10 Mb* with and without a proprietary filter that preferentially removes incorrect matches. We've initially selected the 5 Mb cutoff with the filter as providing the best balance between false negative (true matches that we fail to call a match) and false positives (false matches that we call true matches). Second, it allowed us to make a better cousinship prediction. For example, our data suggest that most relationships that are theoretically predicted to be third cousins are really fourth cousins or deeper. Therefore, a fourth cousin match at AncestryDNA, we believe, is a third cousin match at other services.

Ken's assertion that AncestryDNA is using a more conservative prediction calculation does appear to be in agreement with what I and many of my colleagues have observed. Time will tell if it is indeed more accurate. The filter aimed at reducing the number of IBS (Identical By State) matches sounds like a promising addition. When we have the ability to examine the raw data we should be able to reach conclusions about how effective the filter is at fulfilling its purpose.

Mb vs cM - What does it mean to us?
As you may know, the centimorgan rather than mega base pairs is used by Family Tree DNA's Family Finder and also primarily by 23andMe as the length of measurement for matching autosomal DNA segments. So, how does this 5 Mb threshold compare to the 5.5 cM* threshold used by Family Tree DNA (*edited from 7.7 cM after I was sent this) and the 7 cM threshold used by 23andMe in their Relative Finder feature? The National Institutes of Health website tells us that in human genetics, "one centimorgan is equivalent, on average, to one million base pairs" or 1 Mb. Genome.gov agrees, "Generally, one centimorgan equals about 1 million base pairs." However, in reviewing my Ancestry Finder download at 23andMe, which lists the length of segments both in Mb and in cM, I came to the conclusion that, unfortunately, it isn't that simple - at least for our purposes. In some cases, the numerical value in Mb was larger than in cM for the same segment, but in other cases it was smaller. I copied portions of my Ancestry Finder download to demonstrate examples. If you have tested at 23andMe, take a look at your own file to get a feel for the comparison.

The first chart shows the respective values when the Mb value was 11 and the second when the cM value was 11:

Notice the wide range of values in both directions. It appears to be impossible to find a direct correlation between segment lengths measured in Mb and cM. In some cases, the two values weren't even close:

The reason the number of base pairs that a centimorgan corresponds to varies so widely is because when the distance along a chromosome is measured in mega base pairs (Mb), the value strictly reflects how many millions of base pairs there are in a matching segment, but when using centimorgans (cM) to express the distance along the chromosome, the frequency or chance of recombination expected within that segment is being measured. Some portions of the genome are expected to recombine more often than others, therefore sometimes a segment of 1 Mb has a relatively good chance of remaining intact and sometimes it does not.

Different Predictions and More Matches
This difference between AncestryDNA's way of calculating the length of segments and that of the other two companies may explain, in part, the reason that some of us are seeing the same matches at AncestryDNA as we have at the other two companies, with very different predictions. The fact that AncestryDNA is using a phasing engine before running the matching algorithms will also account for some of the reported discrepancies. When asked why AncestryDNA is, on average, returning more matches than the other two companies, Ken offered one possible explanation. He said that it may be a result of the AncestryDNA database containing primarily customers with deep roots in the United States and, in many cases, descending from large Colonial New England families.

Adding International Customers to the Database
This discussion prompted me to inquire as to when AncestryDNA plans to offer the test to international customers. Ken said that it is certainly "on the radar", but they do not have an estimate of when this will happen yet. He explained some of the reasons for this:

1. Demand is still high within the United States and they are "processing samples as fast as we can right now".
2. The privacy laws in Europe are, in some cases, different than the US. Therefore, this will take additional time to address.
3. They will need to work out logistical issues.

He emphasized that Ancestry.com is a large company, which necessitates significant forethought and planning before taking action.

Uploadable Raw Data
During our conversation, Ken also addressed the questions surrounding the format the raw data will be presented in as well as the much-hoped-for matching segment data.

"We will be providing raw data download in early 2013. We have not made any formal decision on segment data. We understand that it is important to some of our customers and are taking it into serious consideration."

When I asked him whether the raw data would be formatted in such a way that will be compatible with uploading to third party sites such as Gedmatch.com, he assured me that it would. Fortunately, this puts to rest all of our speculation that the "related security enhancements" Ken referred to in his keynote address at the Consumer Genetics Conference last month would interfere with the data's usability. When I inquired further about the future availability of segment data, he said that he cannot promise anything in that regard, but was open to discussing what presentation formats of that data might be acceptable to genetic genealogists.

Admixture and Reasonable Assumptions
Some have also interpreted Ken's statement (reported by Esquire) that some customers are using their own knowledge to make reasonable assumptions that are leading to incorrect conclusions, to mean that AncestryDNA is using an altogether different method of determining our matches than the other two companies offering autosomal DNA tests for genealogy. He explained to me that what he was actually referring to was that many customers are assuming that because autosomal DNA matching is only applicable to relatively recent ancestry, that the admixture results also reflect that time period and should match what we know of our ancestral origins from our family trees. He emphasized that AncestryDNA's "Genetic Ethnicity" feature (like any admixture tool) is not looking at the large segments used for relative matching, but rather is examining much smaller blocks and single markers that are ancestrally informative. Therefore, some of this admixture is very old - offering a glimpse much further back in time than our known family trees. He offered reassurance to those who feel that this portion of the test is not yet as accurate as it should be (me included):

AncestryDNA is data-driven. Our team of scientists are constantly analyzing the data looking for ways to improve the ethnicity and matching prediction algorithms. The science, and hence the customer experience, is only going to improve with time.

At some point, the Sorenson data will likely be incorporated into the AncestryDNA test, which should improve the admixture predictions tremendously.

Even the CEO is working with his matches!
In closing, it was very nice to hear that "everyone from the CEO down is working with their matches" at Ancestry.com. This should lead to a management team that is educated about what we as genetic genealogists are trying to accomplish and how best to do it. As a result, I look forward to improved tools and results at AncestryDNA in the future.

I want to thank Ken Chahine and Stephen Baloglu for their recent efforts to shed some light on aspects of the AncestryDNA test and clear up some of the confusion. As I told both of them, the more transparency that AncestryDNA can offer to the genetic genealogy community, the more satisfied we will all be with the product. According to Ken, it is likely that more details will be revealed soon. That is a very good thing because as I was writing this, I thought of many more questions for him!