Your Genetic Genealogist: 2013

Tuesday, December 10, 2013

Family Tree DNA's Latest Updates

Family Tree DNA made many genetic genealogists very happy with their updates last week. I wasn't expecting to see anything new from them for at least two weeks, but they are already releasing another update today. I am extremely encouraged by the wonderful progress that Family Tree DNA is making toward giving its customers exactly what they have been asking for. With these wonderful improvements, they are demonstrating that they really are listening to what we want and need. Thanks again, FTDNA!

These changes are scheduled to go live today.

Rebekah Canada tells us:

Weekly Information Technology/Engineering Update (10 Dec 2013)

Matches Maps Locations Clear Button

Some users have requested the ability to clear their stored map coordinates for their most distant known maternal or paternal ancestors. We have added a Remove Location button to Step 3 of the Update Most Distant Ancestor’s Location wizard.

Family Tree DNA myFTDNA BETA Family Finder – Matrix

Today, we are happy to release our new BETA Family Finder – Matrix page. The Matrix tool can tell you if two or more of your matches match each other. This is most useful when you discover matches with wholly or partly overlapping DNA segments on the Family Finder - Chromosome Browser page.

Due to privacy concerns, the suggested relationship of your two matches (if related) is not revealed. However, we can tell you whether they are related according to our Family Finder program. To use it, you select up to 10 names from the Match list on the left side of the page and add them to the Selected Matches list on the right side of the page. A grid will populate below the lists. It will indicate whether there is a match (a blue check mark) or there is not a match (an empty white tile).

You access the BETA Family Finder – Matrix page through the Family Finder menu in your myFTDNA account.

The page starts out with two list areas: Matches and Selected Matches. You add Matches to the Selected Matches list by clicking on a name and then on the Add button.

Here is a screenshot of the BETA Family Finder – Matches page with a few matches added to the Selected Matches list.

Saturday, December 7, 2013

23andMe Releases a Sample of Their New V4 File: First Look and Analysis

23andMe has released access to a sample file of their new v4 chip. This is a 100% custom chip with hand-selected SNPs. In a departure from the other two companies offering autosomal DNA testing to the genealogy community, they are now using the Illumina iSelect chip as the foundation instead of the Illumina OmniExpress chip. On the new v4 chip, the total number of autosomal DNA and X DNA SNPs has decreased substantially, while the total number of mtDNA SNP and Y-SNPs has increased.

Since I heard about this new chip, I have been concerned about the impact it will have on the genetic genealogy community and, particularly, on the compatibility with 3rd party uploads and tools. If 23andMe customers are unable to take advantage of the extremely beneficial opportunity to upload their data into Family Tree DNA's Family Finder database and to make use of the wonderful tools at GEDmatch, that would be a huge loss for all of us. It would also be a shame if the usefulness of the Y-SNPs and mtDNA SNPs tested by 23andMe is reduced for our community. To try to determine if this will be the case, Dr. Tim Janzen has helped me to analyze the new file overall and Larry Vick has specifically analyzed the Y-SNPs.

The v4 chip currently has just over 602,000 total SNPs versus 967,000 total SNPs on the v3 chip. This chip is not as robust as the chip previously used by 23andMe or the chips used by Family Tree DNA and AncestryDNA for the autosomal and X-SNPs.This is of great concern, but 23andMe has stated that they plan to impute a large number of SNP allele values from our results (see comments at http://blog.23andme.com/news/23andmes-new-custom-chip/), so hopefully Family Tree DNA and citizen scientists will be able to do the same to extract the most utility and compatibility from this platform. I am continually surprised and amazed by the resourcefulness of our community, so I am hopefully optimistic.

This change in platforms was intended to help the company ramp up their processing capacity in conjunction with their massive marketing campaign to acquire one million customers and beyond. (They can run 24 samples on each v4 chip at once instead of 8 on each of the v3 chips.) Especially in light of 23andMe's recent decision to provide ancestry-related interpretation and raw data exclusively, the v4 chip does not appear to be a beneficial development. In fact, it may result in additional loss of sales. Currently, the genetic genealogy and the DNAAdoption communities generally recommend first testing at 23andMe and then transferring the raw data into FTDNA's Family Finder database in order to be in two databases at a reduced price. If it turns out that FTDNA is unable to continue to accept 23andMe transfers, these recommendations will likely change. Loss of compatibility with Gedmatch would also have a very detrimental effect on the utility for those using the data for genealogical and admixture research.

The following analysis is in depth and intended for those who want the specific details of the changes. Instructions to download the v4 file will follow the analysis.

First let's look at the Y-SNPs.

Larry Vick tells us:
I compared my Y-SNP file I downloaded on 14 Aug 2012 to the Y-SNP file I downloaded today to see if my current file has any changes. There weren't any. I then compared the (v4) file for Greg MENDEL that I downloaded today to a v3 file I downloaded for a friend (CRL) on 21 Mar 2013.

There are 2,329 Y-SNPs in the Greg MENDEL v4 file. CRL had 1,766 Y-SNPs in his v3 file. So the v4 file has 562 more SNPs than CRL's v3 file. Looking at the SNPs, there were 446 in CRL's v3 file that were not in the MENDEL v4 file. There were 1,009 in the MENDEL v4 file that weren't in the CRL v3 file. Of the 446 in the CRL v3 file that weren't in the MENDEL v4 file, 314 were no calls.

Of the 1,009 in the MENDEL v4 file that weren't in CRL's v3 file, 295 were no calls. When I compared the 1,009 in the MENDEL v4 file to my file I downloaded today, 494 were not in my file (although they could have been in past downloads but were removed prior to today's download). I have SNPs from v1, v2, and v3. Of the 515 that were in my file, 99 were no calls.

I compared the 494 that were not in my file to Adriano's file in the Y-Chromosome Comparison Project, and 268 were in his file. All but one of those 268 had i prefixes for the reference sequence number.

I then compared the 226 that weren't in Adriano's file to the ISOGG Y-SNP Compendium by position number (build 37), and 200 were in the ISOGG file. I created a list of those 200 with the first note field. The 26 that weren't in the ISOGG file included the one SNP with an rs number (rs5603911). The MENDEL file had 46 no calls for the 200 SNPs in the ISOGG file. All but one of the 226 that weren't in Adriano's file had i prefix reference sequence numbers.

Now let's look at the overall composition of the new v4 chip as compared to the other platforms.

Tim reviewed the v4 file and compared it to v2, v3, and FF files. The following are some general statistics on which he based his analysis.

atDNA SNPs
v2:
561,846 atDNA SNPs in 23andMe v2 data in a 2009 download.
515 i atDNA SNPs in 23andMe v2 data in a 2009 download.
556,787 atDNA SNPs in 23andMe v2 data in a fresh download.
758 i atDNA SNPs in 23andMe v2 data in a fresh download.

v3:
930,381 atDNA SNPs in 23andMe v3 data in a fresh download.
7,455 i atDNA SNPs in 23andMe v3 data in a fresh download.

v4:
577,382 atDNA SNPs in 23andMe v4 data.
41,855 i atDNA SNPs in 23andMe v4 data.

Family Finder:
708,092 atDNA SNPs in Family Finder data in general, but a fresh download only had 707,269 SNPs in it.

Y-SNPs
v2:
1880 Y SNPs in 23andMe v2 data in a fresh download. Of these 213 are i SNPs.

v3:
1766 Y SNPs in 23andMe v3 data in a fresh download. Of these 232 are i SNPs.

v4:
2329 Y SNPs in 23andMe v4 data. Of these 526 are i SNPs.

X-SNPs
v2:
13,876 X SNPs in 23andMe v2 data in a 2009 download. Of these 19 are i SNPs.
13,828 X SNPs in 23andMe v2 data in a fresh download. Of these 96 are i SNPs.

v3:
26,007 X SNPs in 23andMe v3 data in a fresh download. Of these 1006 are i SNPs.

v4:
19,487 X SNPs in 23andMe v4 data in a fresh download. Of these 4227 are i SNPs.

Family Finder:
18,022 X SNPs in Family Finder build 37 data in a fresh download.

mtDNA
v2:
2019 mtDNA SNPs in 23andMe v2 data in a fresh download. Of these 1572 are i SNPs.

v3:
2459 mtDNA SNPs in 23andMe v3 data in a fresh download. Of these 2016 are i SNPs.

v4:
3154 mtDNA SNPs in 23andMe v4 data. Of these 2681 are i SNPs.

Here are his comparisons between the various platforms.

atDNA:

453,854 atDNA SNPs in 23andMe v4 data are also found in 23andMe v2 data in a 2009 download. Of these SNPs, 419 are i SNPs.

453,357 atDNA SNPs in 23andMe v4 data that are also found in 23andMe v2 data in a fresh download. Of these SNPs, 546 are i SNPs.

509,630 atDNA SNPs in 23andMe v4 data that are also found in 23andMe v3 data in a fresh download. Of these SNPs, 6153 are i SNPs.

304,864 atDNA SNPs in 23andMe v4 data that have rs numbers are also found in Family Finder in a fresh download.

I then checked the 41,855 i atDNA SNPs in 23andMe v4 data and checked for matching positions in the Family Finder data. I found that there were 2556 i atDNA SNPs in 23andMe v4 data that had matching positions in the Family Finder data.

Assuming that all of those matching positions correspond with the same SNP in the Family Finder data, there are a maximum of 307,420 atDNA SNPs in 23andMe v4 data that are also found in Family Finder in a fresh download.

Y-SNPs:

979 Y DNA SNPs in 23andMe v4 data that are also found in 23andMe v2 data in a fresh download.

1320 Y DNA SNPs in 23andMe v4 data that are also found in 23andMe v3 data in a fresh download.

This means that there are 1009 Y SNPs found on the v4 chip that aren’t found on the v3 chip.

There are 563 more Y SNPs in v4 data than in v3 data.

X-SNPs:

11,070 X SNPs in 23andMe v4 data that are also found in 23andMe v2 data in a fresh download.

11,009 X SNPs in 23andMe v4 data that are also found in 23andMe v2 data in a 2009 download.

14,437 X SNPs in 23andMe v4 data that are also found in 23andMe v3 data in a fresh download.

7,513 X DNA SNPs in 23andMe v4 data that are also found in Family Finder in a fresh download.

mtDNA:

1698 mtDNA SNPs in 23andMe v4 data that are also found in 23andMe v2 data in a fresh download.

2208 mtDNA SNPs in 23andMe v4 data that are also found in 23andMe v3 data in a fresh download.

This means that there are 946 mtDNA SNPs found on the v4 chip that aren’t found on the v3 chip.

Tim's preliminary conclusions
The fact that there are only 307,420 atDNA SNPs in 23andMe v4 data that are also found in Family Finder is highly concerning. The specificity of matches when comparing v4 data to FF or AncestryDNA data will be significantly reduced in projects such as my Mennonite autosomal project. At DNA SNP coverage for the overlapping SNPs between v4 data compared to FF or AncestryDNA data will only be about 44 SNPs per cM. I don’t know if FTDNA and GEDmatch will be able to allow imports of 23andMe v4 data. The fact that there are about 130,000 more atDNA SNPs in a Family Finder dataset than in v4 and the fact that v4 data won’t be readily uploadable to GEDmatch is forcing me to rethink 23andMe as my primary testing lab for distant relatives.

Third Parties and Download
My sincere hope is that Family Tree DNA and GEDmatch are able to adjust their systems to work with this new data from 23andMe. As soon as I hear anything, I will be sure and report it.

If you are interested in analyzing the file yourself and you have a 23andMe account, these are the directions for downloading the sample v4 file from 23andMe:

Enable the Mendel family in your account here:
https://www.23andme.com/user/edit/examples/

Then select Greg Mendel's raw data file from the drop-down list here:
https://www.23andme.com/you/download/

This file will probably still change slightly as they complete their validation process, but it should be pretty close to what we will start to see for new customers at 23andMe in the coming weeks.

Please let me know what you think. I am especially interested to hear analysis from the citizen scientists and the creators of our community's third party features.

Wednesday, December 4, 2013

FTDNA Releases Updates in Response to Requests from Project Administrators

I received an email from Bennett Greenspan, CEO of Family Tree DNA, today with some very welcome news! FTDNA is not wasting any time making good on the promises that they made at their conference last month. Obviously, they really were listening to us! (Take a look at #6...this is a very important update that will make a lot of genetic genealogists happy!)

Today we are releasing some great updates that were requested during our 9th International Conference on Genetic Genealogy. Here is a quick summary with some screen shots of what to expect.

1. The timeout for myFTDNA has been increased from 30 min to 2 hrs. This will benefit everyone but will especially be appreciated by our Group Admins when they are impersonating into a kit.

2. Changed the word "Triangulation" to "Common Matches" for Family Finder matching.

3. Instead of using the word "Steps" on the matching pages we will now use "Genetic Distance." This will effect both the Y-DNA and mtDNA matching pages.

4. Fixed the Interactive Tour. It was getting stuck at the Family Finder section, but will now complete.

5. Updated the Profile Pop up on matching pages with a new design and restored the "About Me" section and badges. This profile is available on all matching pages: Y-DNA, mtDNA, Family Finder, and Advanced Matching.

6. Added the ability for a user to download chromosome browser data for all of their matches. This new option is towards the top right side of the chromosome browser page and will be in Excel format.

These features are already live, so go check them out! I am told that there will be more updates every two weeks, so I'm really looking forward to seeing what they release next (on or about December 18th). Thanks, FTDNA!

Wednesday, November 20, 2013

A List of Alternate Names for the Y-SNPs from BritainsDNA's Chromo2 Test

(For Advanced Y-DNA Researchers)

Dr. James Wilson from BritainsDNA has sent me a list of the alternate names for the Chromo2 Y-SNPs and has given me permission to post it for public access with his comments, as follows:

Please find attached a list of alternate names for chromo2 Y SNPs...This is based on comparisons some months ago now, although a few SNPs have been updated in the meantime...Where there is no alternative name for an S SNP it means it was not listed/named in any compendium, browser or database I had available when this file was put together and is not available in any other product as far as I am aware (of course apart from a complete Y chromosome sequence).

Note that all the SNPs on this list manufactured on the chip, but a small proportion do not give good genotyping clusters; I haven't had time to clean them out. There are also a small number of SNPs not on this list eg S28, S250, which we Taqman in the appropriate samples as Illumina appear to have removed them from the design and they have no proxies on the chip or none known.

In due course I also intend to share the genome co-ordinates to allow comparisons with whole Y chromosome sequences, despite this being a case of handing larger competitors the fruits of our investment. At present I have started to do that for individual SNPs that are queried on a case-by-case basis, as I don't as yet have the permissions of all of the sequenced individuals to hand out their SNPs in this way.

You can find the list here. (Make sure you download the entire list and not just what is visible through Google Docs.)

Thanks, Jim!

Wednesday, November 13, 2013

News from Family Tree DNA: "Big Y" Sequencing, Conference and Holiday Sale

Family Tree DNA's 9th Annual International Conference on Genetic Genealogy was held this past weekend in Houston. It was clear to those in attendance that, with the recent acquisition of Arpeggi, things are changing over at Gene by Gene. There has been an infusion of "new blood" into the company and with it has come new enthusiasm, resources and promise. The team was very open to hearing the community's needs and priorities, with the new staff listening in on our "roundtable" discussions where attendees were encouraged to share their ideas. As a result, they have promised the genetic genealogy community some of our most requested features in the near future.

Y-DNA Sequencing
These are exciting times for our Y-DNA citizen scientists! Following right on the heels of the delivery of the first results from Full Genomes, Family Tree DNA announced their new Big Y next gen sequencing product at the conference. Justin Petrone covered this development in his article in BioArray News yesterday as have a number of genetic genealogy bloggers.

In all of this excitement there has been a lot of discussion regarding what these competing products will and will not deliver. For the answers to many of these questions, we will have to wait until the first Big Y results start to be returned in approximately 10-12 weeks. However, one of the most pervasive concerns can be addressed now.

Throughout the genetic genealogy community for the last couple of days, there has been speculation that Family Tree DNA's Big Y product will not include raw data downloads. I found this very difficult to believe with FTDNA's track record of transparency, so I asked Gene by Gene's new Chief Scientific Officer Dr. David Mittelman for clarification on this matter. He confirmed that, while the Big Y results will consist of SNP calls, the raw data will be available for those who want it - just as it is for the company's exome and whole genome products. He added, "We will want to set up some infrastructure to support downloading these big raw files and we will need to clarify for customers that our CSRs are obviously not able to offer advice or support on how to use them."

I am relieved (although not surprised) to hear this, as I am sure many of you are. I appreciate Dr. Mittelman's quick response to my inquiry even though he was traveling.

The Big Y is being offered at a discount through November 30, 2013 for $495 and will increase to $695 after that. Previous "Walk Through the Y" customers will receive an additional $50 discount. (Various project admins have reported a lot of orders, so the offer appears to be a success already.)

Holiday Sale
The Family Tree DNA holiday sale has begun and is running through Dec 31st. Any customer whose order that includes a Family Finder autosomal DNA test will receive a $100 Restaurant.com gift certificate. This sale includes new orders, upgrades and 23andMe/AncestryDNA transfers:

Y-37 for $119 (reg. $169)

Y-67 for $189 (reg. $268)

Y-111 for $289 (reg. $359)

mtFull for $169 (reg. $199)

Family Finder for $99 (includes a free $100 Restaurant.com gift certificate)

Family Finder + Y-37 for $218 (reg. $268) inc. $100 Restaurant.com gift certificate

Family Finder + Y- 67 for $288 (reg. $367) inc. $100 Restaurant.com gift certificate

Family Finder + mtFull for $268 (reg. $298) inc. $100 Restaurant.com gift certificate

Y-37 + mtFull for $288 (reg. $366)

Y-67 + mtFull for $358 (reg. $457)

Comprehensive for $457 (reg. $566) inc. $100 Restaurant.com gift certificate

Autosomal DNA Transfer for $49 (Reg $69)

Y-Refine 12 to 37 for $69 (reg. $109)

Y-Refine 12 to 67 for $148 (reg. $319)

Y-Refine 25 to 37 for $35 (reg. $59)

Y-Refine 25 to 67 for $114 (reg. $59)

Y-Refine 37 to 67 for $79 (reg. $109)

Y-Refine 37 to 111 for $188 (reg. $220)

Y-Refine 67 to 111 for $109 (reg. $129)

mtHVR1 to Mega for $149 (reg. $169)

You can order here.

Conference Coverage
When I first started blogging the FTDNA conference, I was largely alone. Now with the excellent and thorough conference coverage by other bloggers and on Twitter, it is no longer necessary for me to give a blow-by-blow account here. Thanks to Debbie Kennett for compiling a comprehensive list of conference posts on her blog.

Monday, November 4, 2013

Upcoming Presentations: November 8th and 14th

I am speaking twice in the next week and a half - once locally and once in Austin, Texas. I hope to see some of you there! Details follow:

November 8, 2013 - Adoption Knowledge Affiliates Conference, Austin, TX.

1:15 – 2:45pm, Session A: Using DNA Testing to Discover the Genealogical Roots of Adoptees
Due to recent technological advances, adoptees and individuals without knowledge of their genealogy are increasingly turning to DNA testing to discover their genetic roots. The tight-knit AdoptionDNA community has been in the forefront of innovation in this area, helping many adoptees rediscover their birth families and ancestral origins. Attendees will learn some of the techniques and tools successfully used.

November 14, 2013 - Diamond Gateway Women's Club at 6:30pm, Penasquitos, CA. Reservations required, contact Dael at 619-252-0804 or daelnk612@yahoo.com. There is a $5 fee. Mt. Carmel Church of the Nazarene, 10060 Carmel Mt. Rd., San Diego, 92129.

Sunday, November 3, 2013

First Look at the Full Genomes Y-Sequencing Results from Itaï Perez

(**Warning - Advanced Content**)

Guest Blogger, Itaï Perez, reviews his Full Genome results for my readers:

For those wondering what the results from Full Genomes look like, here’s a first look.

After a long wait while my kit was sequenced and analysed, I finally received an email from Full Genomes with an attached rar archive containing 9 files.

Almost all these files are in a formatted text format, and can easily be converted to an excel table (which I did).

Here’s the description of the 9 files, as much as I understand it, one by one:

File #1 - PrivateSNPs

This one is easy to understand. It is a list of all Private SNPs discovered in my sequencing. Here is the description found in the beginning of the file (which I removed when converting to Excel).

#based on 20131001 variantCompare analysis using PGP083013.filt.pyfilt.1kGfilt.vcf and ALL.1kG.samplelist.redo.sorted.paths.20130812.curated_pm.filt2.called.pyfilterCG2k.vcf reference files

And here is the file itself:

The columns are SNP name, position, ancestral and derived base. The position of these new SNPs have been removed from each image in order to give the Full Genomes team time to register and name them.

File #2 - yknot

This is the only file which is not a table. This text file includes a tree, following my positive SNPs from Y-Adam to my current most recent SNP, as defined in the ISOGG Y-tree.

File #3 - variantCompare

This file is more complex. Here is the description in the beginning of the file:

#FGC report: Analysis of Called Variants

#this report analyzes variants called as differing from the GRCh37 reference sequence

#for best viewing, open with tab-delimiting in a spreadsheet viewer

#reliability flag key: no flag: over 99% likely genuine; *: over 95% likely genuine; **: about 40% likely genuine; ***: about 10% likely genuine

#it is strongly suggested that results analysis be restricted to variants with zero or one asterisks

#citations for reference data include:

# 1000 Genomes Project: An integrated map of genetic variation from 1,092 human genomes, McVean et al., Nature 491, 56-65 (01 November 2012) doi: 10.11038/nature11632

# Personal Genome Project: Ball, Madeleine P., et al. A public resource facilitating clinical use of genomes. Proceedings of the National Academy of Sciences 109.30 (2012): 11920-11927. http://www.pnas.org/content/109/30/11920.long

# A High-Coverage Genome Sequence from an Archaic Denisovan Individual, Meyer et al. Science 338, 222-226 doi:10.1126/science.1224344

# R. Drmanac, et. al. Science 327(5961), 78. [DOI: 10.1126/science.1181498]

GRCh37 is the Genome Reference Consortium human genome (build 37). I guess it is a reference genome similar to CRS or RSRS for mtDNA. This table lists all the SNPs which vary from this reference. The fields are position, base change, rsID, SNP name, reliability and a list of the reference genomes which share this change. There are four successive sections: shared SNPs, private SNPs, shared INDELs and pricate INDELs.

Here’s how the file looks:

I also received a small manual describing this file and how to use it:

File #4 - strcall203.lobystr203report

This table contains the list of all STRs.

Here’s the description in the file :

#FGC Y-STR report generated based on lobSTR pre-v2.0.3 (sourceforge git revision 34534b) processing

#lobSTR citation: Gymrek M, Golan D, Rosset S, & Erlich Y. lobSTR: A short tandem repeat profiler for personal genomes. Genome Research. 2012 April 22.

#Notes:

#Repeat counts reported according to lobSTR standards; conversion required in certain cases to produce results based on other counting standards

#chrY coordinates based on hg19 / b37 reference sequence

#Marker conversions to FTDNA standards for DYS448, DYS449, DYS607, DYS576, DYS511, DYS640, and DYS485 are provisional

#Marker results known to be unreliable include: DYS413a/b, DYS490, DYS572, DYS726, DYS534, DYS446, and DYS487

#default lobSTR database has been augmented with results for DYS540, DYS712, DYS593, DYS715, DYS513, DYS561, DYS497, DYS510, DYF385.1, and DYF385.2, which should be treated as provisional

#Only two copies of DYS464 and DYF371 are called here; fully-spanning read details can provide insight into additional copies

#DYF371 includes DYS425

#NR = not reported / no reads

#NA = not available

#call confidence: 1 corresponds to highest confidence, 0 corresponds to lowest confidence; results with call confidence below 0.2 should be considered very speculative

#conflict flags: ? = conflicting fully-spanning reads; * = conflicting partially-spanning reads; % = het result in diploid calling for marker not recognized as multicopy; & = not called in diploid calling

#read details: Format is [repeat count]|[number of reads supporting given repeat count], with different counts separated by ';'. In the case of multicopy markers like DYS464, the fully-spanning read details can be used to determine repeat counts for additional copies

And here is what the file looks like:

Now this gets very technical and I don’t understand everything, but from what I can figure out, first we have the STR name and the estimated result, and then follows information explaining how this result was found and how sure the program is of it.

File #5 - strcall203.lobystr203report ftdna

This table also lists the STRs, but in a much simpler form. You simply have the name and the results, and the STRs are in the order they are found at Family Tree DNA.

The description in the file is:

#Marker conversions to FTDNA standards for DYS448, DYS449, DYS607, DYS576, DYS511, DYS640, and DYS485 are provisional

#see main Y-STR report for further information regarding reliability, etc.

And here’s the table:

File #6 - mttype.RSRS.MT

This table gives the mtDNA results in RSRS format. It gives for each SNP the position and the ancestral and derived result.

Here’s the description:

#FGC mtDNA report

#Variants with respect to RSRS

And here is the file:

File #7 - mttype.rCRS.MT

This one is exactly the same, but using the CRS format.

#FGC mtDNA report

#Variants with respect to rCRS

File #8 - haplogroupCompare

This table lists my SNPs and compares them to some reference results from my haplogroup or close to it. It quite similar to the variantCompare file. The fields are position, base change, rsID, SNP name, reliability and the reference results mine is compared to. There are two successive sections: shared SNPs and private SNPs.

Here’s the description :

#FGC report: Detailed Analysis of Called SNPs

#refer to Analysis of Called Variants for citations and other details

#in the reporting below, it is assumed that the reference allele is ancestral ("-") and the sample allele is derived ("+"); "x"=ambiguous and "?"=no-read/no-call

#note that this report uses a different, simplified variant calling approach from that used in the Analysis of Called Variants report, so results may differ, especially for less-reliable variants

Haplogroups in the neighborhood of G-L91 being considered; includes: G-L91;G-L166;G-M286

And here’s the file:

File #9 - gtype

This one is also a bit complex. It lists the Y-SNPs and seems to detail how the results were determined.

Here’s what it looks like:

This ends the description of these nine analysis files. Note that I am still waiting for access to my results on the website and to my sequencing raw file. If you are interested I’ll write another article to show it to you then.

Thanks Itaï!

These tools were developed by Dr. Greg Magoon with the supervision of Justin Loe. Justin tells us "these are not final versions and will be upgraded to a more user-friendly presentation by specialists in user-interfaces."

BGI provided the sequencing services and developed the Y chromosome chip.

If you have any questions, please post them below and I will try to get them answered. I'm sure we will be seeing a lot more regarding the Full Genomes test soon...

Thursday, October 17, 2013

AncestryDNA's New Ethnicity Predictions Rolling Out to Customers

AncestryDNA customers will be happy to see that the new ethnicity results are starting to appear in their DNA accounts today. If you haven't checked yours yet, be sure and do so. I have been working with the new predictions for a little over a month and feel that they are a huge improvement over the original version. That's not to say that they are perfect, but no admixture predictions are without weaknesses. As Dr. Catherine Ball emphasizes in the new introductory video that is why they are called estimates.

I have intended to write about this new feature ever since I was invited to participate in a webinar and conference call by AncestryDNA with a handful of other bloggers back in early September, but I have been overwhelmed with other work that I will write about soon. I plan to post a couple of upcoming blogs to catch up with all of the exciting genetic genealogy news, including additional coverage on AncestryDNA's advancements.

Is Ancestry Using the Sorenson Samples?
First, I want to discuss the role of the Sorenson data in these updated and more refined predictions. There has been a lot of conflicting information shared on the blogs, in the forums and even by Ancestry.com employees in this regard. I spoke with Dr. Ken Chahine, AncestryDNA's General Manager, to clarify the role of the Sorenson data and the status of the DNA samples collected by Sorenson.

AncestryDNA's first version of their Genetic Ethnicity feature used public data sets for the reference populations. For the new version, Ken confirmed, they have transitioned to using Sorenson samples and the associated pedigree data "almost exclusively" (and not unsourced Ancestry.com personal member trees).

Contrary to what has been claimed by some and in agreement with what Ken has told me in the past, AncestryDNA does, in fact, have possession of the physical DNA samples. How else could they have been integrated into this new cutting-edge technology? They have been retesting an increasing number of those samples on the Illumina chip that they use for the AncestryDNA test in order to improve their ethnicity predictions. Upon hearing this, I know that many people will wish to test those samples of deceased family members who donated their DNA to Sorenson on the AncestryDNA platform, however it was explained to me that for legal reasons AncestryDNA is not currently able to allow that. They are attempting to work out the legalities involved, but cannot guarantee that this will be an option in the future. Ken said that he "definitely understands the desire, the need" to access these samples to take advantage of the more advanced genetic testing technology, but AncestryDNA is required ensure that everyone's privacy is protected and they, as a corporation, are covered legally. Ken explained that AncestryDNA would very much like to come up with a solution to be able to genotype these samples and deliver the results to their family members, but they just don't have an answer at this point as to if and when this will be possible. Further, he explained that it may turn out that AncestryDNA will not be able to overcome the legal difficulties involved with allowing third party access to these samples. Moving forward AncestryDNA is looking into creating an option for designating a beneficiary for current AncestryDNA accounts/samples in order to avoid this dilemma in the future.

Now, to discuss the details of the new release...

Transparency
What I like most about the presentation of these new ethnicity estimates is that AncestryDNA has worked very hard to make the science transparent, just as the genetic genealogy community has been requesting. They have released an extensive white paper delving into the minute details of their work. You can find it on your ethnicity estimate page by clicking on the little "i" in the upper right hand corner.

The New Ethnicity Estimate home page - click to enlarge

The have also provided the option to click through to simpler explanations throughout the interface. I recommend that everyone takes the time to go through and read each of these in order to get a better understanding of how this feature works. They do such a good job of this, that it is probably unnecessary for me to go into extensive details here.

As you will see, the graphics are extremely well done in these explanations. For example, the Regional Overlap Chart (below) helps to explain why it is so difficult to break continental regions into sub-regional categories.

Additionally, the AncestryDNA team has done a very good job of illustrating the reality of this challenge with the graphic depictions of the ranges integrated with the estimates.

There is more detail offered for the customers who wish to "dive down" into the technical details in the click-thru explanations.

New Home Page
I also like the updated look for the new home page. It summarizes the important details of your results in an easy-to-understand format.

Some customer's pages are really colorful now!

Increased Resolution and Detail
All of the results in my account are much more accurate based on the research that I have done both on my traditional genealogy and in working with my autosomal DNA matches over the last several years. I also got a couple of surprises with the enhanced resolution of this test.

AncestryDNA has increased their coverage tenfold by analyzing 300,000 SNPs in their predictions as compared to the 30,000 that they were using previously. The new version of their reference panel uses "3,000 DNA samples from people in 26 global regions". In an ambitious attempt, they are the first company to offer customers with African ancestry an estimate of the specific regions in Africa to which their DNA can be traced. I will look forward to hearing the opinions from the African Americans in our community on how well they think AncestryDNA has done with this first attempt. (23andMe will be releasing their own effort soon.)

Coming Soon...
I will examine my results in more depth and share some thoughts and interesting details from my conversation with Ken Chahine.

Disclosure of Material Connection: Some of the links in the post above are “affiliate links.” This means if you click on the link and purchase the item, I will receive an affiliate commission. Regardless, I only recommend products or services I use personally and believe will add value to my readers. I am disclosing this in accordance with the Federal Trade Commission’s 16 CFR, Part 255: “Guides Concerning the Use of Endorsements and Testimonials in Advertising.”