Your Genetic Genealogist: Illumina

Showing posts with label Illumina. Show all posts

Saturday, December 7, 2013

23andMe Releases a Sample of Their New V4 File: First Look and Analysis

23andMe has released access to a sample file of their new v4 chip. This is a 100% custom chip with hand-selected SNPs. In a departure from the other two companies offering autosomal DNA testing to the genealogy community, they are now using the Illumina iSelect chip as the foundation instead of the Illumina OmniExpress chip. On the new v4 chip, the total number of autosomal DNA and X DNA SNPs has decreased substantially, while the total number of mtDNA SNP and Y-SNPs has increased.

Since I heard about this new chip, I have been concerned about the impact it will have on the genetic genealogy community and, particularly, on the compatibility with 3rd party uploads and tools. If 23andMe customers are unable to take advantage of the extremely beneficial opportunity to upload their data into Family Tree DNA's Family Finder database and to make use of the wonderful tools at GEDmatch, that would be a huge loss for all of us. It would also be a shame if the usefulness of the Y-SNPs and mtDNA SNPs tested by 23andMe is reduced for our community. To try to determine if this will be the case, Dr. Tim Janzen has helped me to analyze the new file overall and Larry Vick has specifically analyzed the Y-SNPs.

The v4 chip currently has just over 602,000 total SNPs versus 967,000 total SNPs on the v3 chip. This chip is not as robust as the chip previously used by 23andMe or the chips used by Family Tree DNA and AncestryDNA for the autosomal and X-SNPs.This is of great concern, but 23andMe has stated that they plan to impute a large number of SNP allele values from our results (see comments at http://blog.23andme.com/news/23andmes-new-custom-chip/), so hopefully Family Tree DNA and citizen scientists will be able to do the same to extract the most utility and compatibility from this platform. I am continually surprised and amazed by the resourcefulness of our community, so I am hopefully optimistic.

This change in platforms was intended to help the company ramp up their processing capacity in conjunction with their massive marketing campaign to acquire one million customers and beyond. (They can run 24 samples on each v4 chip at once instead of 8 on each of the v3 chips.) Especially in light of 23andMe's recent decision to provide ancestry-related interpretation and raw data exclusively, the v4 chip does not appear to be a beneficial development. In fact, it may result in additional loss of sales. Currently, the genetic genealogy and the DNAAdoption communities generally recommend first testing at 23andMe and then transferring the raw data into FTDNA's Family Finder database in order to be in two databases at a reduced price. If it turns out that FTDNA is unable to continue to accept 23andMe transfers, these recommendations will likely change. Loss of compatibility with Gedmatch would also have a very detrimental effect on the utility for those using the data for genealogical and admixture research.

The following analysis is in depth and intended for those who want the specific details of the changes. Instructions to download the v4 file will follow the analysis.

First let's look at the Y-SNPs.

Larry Vick tells us:
I compared my Y-SNP file I downloaded on 14 Aug 2012 to the Y-SNP file I downloaded today to see if my current file has any changes. There weren't any. I then compared the (v4) file for Greg MENDEL that I downloaded today to a v3 file I downloaded for a friend (CRL) on 21 Mar 2013.

There are 2,329 Y-SNPs in the Greg MENDEL v4 file. CRL had 1,766 Y-SNPs in his v3 file. So the v4 file has 562 more SNPs than CRL's v3 file. Looking at the SNPs, there were 446 in CRL's v3 file that were not in the MENDEL v4 file. There were 1,009 in the MENDEL v4 file that weren't in the CRL v3 file. Of the 446 in the CRL v3 file that weren't in the MENDEL v4 file, 314 were no calls.

Of the 1,009 in the MENDEL v4 file that weren't in CRL's v3 file, 295 were no calls. When I compared the 1,009 in the MENDEL v4 file to my file I downloaded today, 494 were not in my file (although they could have been in past downloads but were removed prior to today's download). I have SNPs from v1, v2, and v3. Of the 515 that were in my file, 99 were no calls.

I compared the 494 that were not in my file to Adriano's file in the Y-Chromosome Comparison Project, and 268 were in his file. All but one of those 268 had i prefixes for the reference sequence number.

I then compared the 226 that weren't in Adriano's file to the ISOGG Y-SNP Compendium by position number (build 37), and 200 were in the ISOGG file. I created a list of those 200 with the first note field. The 26 that weren't in the ISOGG file included the one SNP with an rs number (rs5603911). The MENDEL file had 46 no calls for the 200 SNPs in the ISOGG file. All but one of the 226 that weren't in Adriano's file had i prefix reference sequence numbers.

Now let's look at the overall composition of the new v4 chip as compared to the other platforms.

Tim reviewed the v4 file and compared it to v2, v3, and FF files. The following are some general statistics on which he based his analysis.

atDNA SNPs
v2:
561,846 atDNA SNPs in 23andMe v2 data in a 2009 download.
515 i atDNA SNPs in 23andMe v2 data in a 2009 download.
556,787 atDNA SNPs in 23andMe v2 data in a fresh download.
758 i atDNA SNPs in 23andMe v2 data in a fresh download.

v3:
930,381 atDNA SNPs in 23andMe v3 data in a fresh download.
7,455 i atDNA SNPs in 23andMe v3 data in a fresh download.

v4:
577,382 atDNA SNPs in 23andMe v4 data.
41,855 i atDNA SNPs in 23andMe v4 data.

Family Finder:
708,092 atDNA SNPs in Family Finder data in general, but a fresh download only had 707,269 SNPs in it.

Y-SNPs
v2:
1880 Y SNPs in 23andMe v2 data in a fresh download. Of these 213 are i SNPs.

v3:
1766 Y SNPs in 23andMe v3 data in a fresh download. Of these 232 are i SNPs.

v4:
2329 Y SNPs in 23andMe v4 data. Of these 526 are i SNPs.

X-SNPs
v2:
13,876 X SNPs in 23andMe v2 data in a 2009 download. Of these 19 are i SNPs.
13,828 X SNPs in 23andMe v2 data in a fresh download. Of these 96 are i SNPs.

v3:
26,007 X SNPs in 23andMe v3 data in a fresh download. Of these 1006 are i SNPs.

v4:
19,487 X SNPs in 23andMe v4 data in a fresh download. Of these 4227 are i SNPs.

Family Finder:
18,022 X SNPs in Family Finder build 37 data in a fresh download.

mtDNA
v2:
2019 mtDNA SNPs in 23andMe v2 data in a fresh download. Of these 1572 are i SNPs.

v3:
2459 mtDNA SNPs in 23andMe v3 data in a fresh download. Of these 2016 are i SNPs.

v4:
3154 mtDNA SNPs in 23andMe v4 data. Of these 2681 are i SNPs.

Here are his comparisons between the various platforms.

atDNA:

453,854 atDNA SNPs in 23andMe v4 data are also found in 23andMe v2 data in a 2009 download. Of these SNPs, 419 are i SNPs.

453,357 atDNA SNPs in 23andMe v4 data that are also found in 23andMe v2 data in a fresh download. Of these SNPs, 546 are i SNPs.

509,630 atDNA SNPs in 23andMe v4 data that are also found in 23andMe v3 data in a fresh download. Of these SNPs, 6153 are i SNPs.

304,864 atDNA SNPs in 23andMe v4 data that have rs numbers are also found in Family Finder in a fresh download.

I then checked the 41,855 i atDNA SNPs in 23andMe v4 data and checked for matching positions in the Family Finder data. I found that there were 2556 i atDNA SNPs in 23andMe v4 data that had matching positions in the Family Finder data.

Assuming that all of those matching positions correspond with the same SNP in the Family Finder data, there are a maximum of 307,420 atDNA SNPs in 23andMe v4 data that are also found in Family Finder in a fresh download.

Y-SNPs:

979 Y DNA SNPs in 23andMe v4 data that are also found in 23andMe v2 data in a fresh download.

1320 Y DNA SNPs in 23andMe v4 data that are also found in 23andMe v3 data in a fresh download.

This means that there are 1009 Y SNPs found on the v4 chip that aren’t found on the v3 chip.

There are 563 more Y SNPs in v4 data than in v3 data.

X-SNPs:

11,070 X SNPs in 23andMe v4 data that are also found in 23andMe v2 data in a fresh download.

11,009 X SNPs in 23andMe v4 data that are also found in 23andMe v2 data in a 2009 download.

14,437 X SNPs in 23andMe v4 data that are also found in 23andMe v3 data in a fresh download.

7,513 X DNA SNPs in 23andMe v4 data that are also found in Family Finder in a fresh download.

mtDNA:

1698 mtDNA SNPs in 23andMe v4 data that are also found in 23andMe v2 data in a fresh download.

2208 mtDNA SNPs in 23andMe v4 data that are also found in 23andMe v3 data in a fresh download.

This means that there are 946 mtDNA SNPs found on the v4 chip that aren’t found on the v3 chip.

Tim's preliminary conclusions
The fact that there are only 307,420 atDNA SNPs in 23andMe v4 data that are also found in Family Finder is highly concerning. The specificity of matches when comparing v4 data to FF or AncestryDNA data will be significantly reduced in projects such as my Mennonite autosomal project. At DNA SNP coverage for the overlapping SNPs between v4 data compared to FF or AncestryDNA data will only be about 44 SNPs per cM. I don’t know if FTDNA and GEDmatch will be able to allow imports of 23andMe v4 data. The fact that there are about 130,000 more atDNA SNPs in a Family Finder dataset than in v4 and the fact that v4 data won’t be readily uploadable to GEDmatch is forcing me to rethink 23andMe as my primary testing lab for distant relatives.

Third Parties and Download
My sincere hope is that Family Tree DNA and GEDmatch are able to adjust their systems to work with this new data from 23andMe. As soon as I hear anything, I will be sure and report it.

If you are interested in analyzing the file yourself and you have a 23andMe account, these are the directions for downloading the sample v4 file from 23andMe:

Enable the Mendel family in your account here:
https://www.23andme.com/user/edit/examples/

Then select Greg Mendel's raw data file from the drop-down list here:
https://www.23andme.com/you/download/

This file will probably still change slightly as they complete their validation process, but it should be pretty close to what we will start to see for new customers at 23andMe in the coming weeks.

Please let me know what you think. I am especially interested to hear analysis from the citizen scientists and the creators of our community's third party features.

Wednesday, November 20, 2013

A List of Alternate Names for the Y-SNPs from BritainsDNA's Chromo2 Test

(For Advanced Y-DNA Researchers)

Dr. James Wilson from BritainsDNA has sent me a list of the alternate names for the Chromo2 Y-SNPs and has given me permission to post it for public access with his comments, as follows:

Please find attached a list of alternate names for chromo2 Y SNPs...This is based on comparisons some months ago now, although a few SNPs have been updated in the meantime...Where there is no alternative name for an S SNP it means it was not listed/named in any compendium, browser or database I had available when this file was put together and is not available in any other product as far as I am aware (of course apart from a complete Y chromosome sequence).

Note that all the SNPs on this list manufactured on the chip, but a small proportion do not give good genotyping clusters; I haven't had time to clean them out. There are also a small number of SNPs not on this list eg S28, S250, which we Taqman in the appropriate samples as Illumina appear to have removed them from the design and they have no proxies on the chip or none known.

In due course I also intend to share the genome co-ordinates to allow comparisons with whole Y chromosome sequences, despite this being a case of handing larger competitors the fruits of our investment. At present I have started to do that for individual SNPs that are queried on a case-by-case basis, as I don't as yet have the permissions of all of the sequenced individuals to hand out their SNPs in this way.

You can find the list here. (Make sure you download the entire list and not just what is visible through Google Docs.)

Thanks, Jim!

Friday, July 26, 2013

Family Tree DNA Will Keep $99 Price for Family Finder

I just received the GREAT news from Max Blankfeld that Family Tree DNA will be able to keep the low and competitive price of $99 for Family Finder:

Family Tree DNA Will Keep Reduced Prices

One month ago Family Tree DNA reduced its Family Finder price to $99 with the promise that if we achieved a minimum volume of orders during our Sizzling Summer Promotion, Illumina would help us keep this price moving forward. We are happy to announce that the genetic genealogy community responded in a big way, and thanks to you we are maintaining the price of the Family Finder test at $99.

We hope that with this price reduction you can reach out to family and friends, so that more and more people can join our growing database and find new matches.

Thank you for your continued support!

Max Blankfeld, Vice-President and COO

http://www.GeneByGene.com

max@genebygene.com

713-868-1438

This development has leveled the playing field for all three companies offering autosomal DNA tests to the genealogy community and allows those who prefer not to receive health results and/or wish to have their DNA sample stored for 25 years to affordably do so. I hope that this will encourage more genealogists to get themselves and their families into the Family Tree DNA Family Finder database. It benefits all of us to have genealogists with well-documented family trees participating in our groundbreaking autosomal DNA research. These price drops are really helping us to get to the critical mass that we need to get the most out of these databases. Recently, I have seen great strides in this regard in my research. Thank you to everyone who ordered tests during this trial run and to FTDNA and Illumina for making this possible.

Monday, March 25, 2013

AncestryDNA, Raw Data and RootsTech

Tim Janzen and I discussing the AncestryDNA features at RootsTech with AncestryDNA staff

Since RootsTech there has been lots of discussion regarding the features that AncestryDNA is and is not planning to offer their customers. I will address the many questions that I have received about the meetings in which I participated at the show, but first let's review:

Raw Data Downloads
On Thursday, AncestryDNA fulfilled their promise to allow customers to download their raw data. As Dr. Ken Chahine had assured me back in November, the file is not encrypted and is compatible with third party tools.

I sent my file to a number of third party providers:

After working with it a bit, John Olson announced on the site that he expects that Gedmatch will be accepting AncestryDNA uploads in about two weeks.
David Pike told me that he has updated his tools to work with the AncestryDNA files.
Leon Kull has reportedly updated his HIR search site to work with them as well.
Dr. Ann Turner has created an Excel macro to convert the AncestryDNA files to 23andMe format.

At the "Ask the Expert" Genetic Genealogy panel that I moderated at RootsTech on Saturday:

Bennett Greenspan told the audience that Family Tree DNA will be accepting AncestryDNA transfers into Family Finder starting on May 1st.
Dr. Catherine Ball confirmed that the raw data file is not phased and that they are delivering it as they receive it from the chip manufacturer Illumina. She also confirmed what Dr. Ann Turner had already discovered - the data labeled as "Chromosome 25" is from the PAR region. Further, the "Chromosome 23" label refers to the X chromosome data and "Chromosome 24" refers to the Y chromosome.

Additional notes:

Unlike Family Tree DNA, AncestryDNA is not removing any SNPs from the data - medically relevant or not.
The overlap between AncestryDNA's raw data file and 23andMe's should be around 690,000 SNPs due to the fact that they are both using the same Illumina OmniExpress Plus base chip. The ~10,000 SNP difference can be accounted for due to a different set of poorly preforming probes and test SNPs. Family Tree DNA's should have a similar overlap for the same reasons.
There is no mitochondrial DNA included in the raw data file because it is not included on the Illumina chip that they are using. (23andMe adds the mtDNA SNPs).

Search Function
As I expected from earlier conversations with AncestryDNA, a search function is next on the list. Kenny Freestone, Product Manager for AncestryDNA, discussed it in his presentation under the heading "What's Next". Although it is already in the works, Kenny could not provide a firm timeline for its availability when I asked.

We will be able to filter our list of matches by surname, location and username. As anyone who has worked with their AncestryDNA matches knows, this is sorely needed. There is no doubt that the many requests from customers pushed this up their list of priorities.

Genetic Ethnicity Update
Later this year, AncestryDNA will be updating their Genetic Ethnicity feature. They will provide more granularity in Europe and West Africa. We can also expect more accurate breakdowns. A number of AncestryDNA personnel acknowledged to me over the weekend that certain "ethnicities" (i.e. - Scandinavian) are overestimated for many customers. However, they also emphasized that much of the perceived problem with their admixture analysis stems from the question of "where and when". What they mean by this is that it is very difficult (and sometimes impossible) to pinpoint where specific DNA signatures were at an exact time in history.

As I always remind my readers, this portion of the science has a long way to go and will improve with more data and time. On the "more data" front, during her speech at the AncestryDNA luncheon on Friday, Dr. Ball was reportedly requesting that genealogists who know that all eight of their great grandparents were born in the same place share this information with AncestryDNA. This seems to imply that, like 23andMe has successfully done, AncestryDNA plans to use customer data to improve their predictions. They are also starting to work on incorporating the coveted SMGF collection into their admixture analysis, which should improve it greatly.

The good news is that AncestryDNA customers don't have to wait for this update to gain more insight into their ancestral origins. Now that AncestryDNA has made the raw data available, customers will be able to upload their raw data file to the various third party sites to try out the admixture calculators and/or send it to Dr. McDonald for his very highly regarded analysis.

Matching
AncestryDNA are currently working on an algorithm to improve matching for endogamous populations, specifically Ashkenazi Jews.

As I reported in November, the minimum threshold for matching is 5 megabase pairs. This was reconfirmed in a conversation I had with Dr. Ball on Friday. I also learned that there is no minimum SNP requirement. We discussed the possibility of AncestryDNA switching to centiMorgan measurements in the future.

Price
The test is now $99 for everyone - subscribers and non-subscribers. This was likely in response to 23andMe's recent price drop. Having attracted well over 120,000 customers in less than a year in business, AncestryDNA is proving to be an important player in this field. This new policy to attract subscribers and non-subscribers alike will only improve their market share.

International Customers
It does not appear that AncestryDNA has plans to offer their test to international customers in the near future, instead choosing to focus on the U.S. market for now.

Matching Segment Data and Chromosome Browser
On Friday at RootsTech, Dr. Tim Janzen and I sat down for a meeting with AncestryDNA management. Among others, we were joined by Dr. Ken Chahine, Senior Vice President and General Manager of DNA, and Dr. Catherine Ball, Vice President of Genomics and BioInformatics. (Dave Dowell also attended a portion of the meeting.) I found them to be very receptive to hearing our requests and the reasons behind them. At no time did they state that they had decided not to build a chromosome browser or release matching segment data to their customers in the future. Dr. Ball did express some privacy concerns, but was open to hearing ideas of how this could be addressed.

Tim Janzen explains his feelings while Ken Chahine looks on

During the meeting, Tim very emphatically explained his feelings on the need for matching segment data (above) and I resorted to begging (below)... {hehe}

Catherine Ball, Ken Chahine, Tim Janzen, me, Dave Dowell and Steve Baloglu

On Saturday, after attending Kenny Freestone's presentation, four advanced genetic genealogists approached him to discuss the chromosome browser issue. In addition to myself, Tim Janzen, Angie Bush, and Nathan Machula were present for the conversation. Kenny didn't have much to say and mostly listened to the arguments that we presented covering why we feel that it is essential that AncestryDNA offer the matching segment data behind their relative predictions. At no time did he state that AncestryDNA would not offer a chromosome browser or that the delay in doing so was because AncestryDNA didn't think that their customers could understand it. He did, however, confirm that it was not a top priority at this time. He also said that he personally reads all of the requests sent through the feedback button, so if you want them to reassess their priorities, then be sure and let them know.

Tim emphasized that both 23andMe and Family Tree DNA included a chromosome browser feature at the launch of their autosomal DNA product and wondered aloud why AncestryDNA had not done so as well. I explained to Kenny (as well as in my meeting with management) that, as genealogists, we expect conclusions to be evidence based. It is not in line with this principle to simply be told that a certain common ancestor is responsible for a DNA match and be expected to take AncestryDNA's word for it. Where is the proof? Since Kenny had shown a chart during his presentation of his ancestral lines that he claimed were genetically confirmed by AncestryDNA matches, I also pointed out the fact that those lines that he had shaded in weren't really confirmed without the actual genetic data to support that claim. To illustrate, I laid out my experience as follows:

On my AncestryDNA account, I was happy to find a shaky leaf hint a few weeks ago.

Upon reviewing the match, I noted that the common ancestor was through my mother's side. I was initially excited to see that I had inherited DNA from my 7th great grandparents on paper, Joseph Denison and Prudence Miner.

The only problem is that this match doesn't appear anywhere on my mother's 47 pages of matches. Do you know what this means? It means that I must have inherited the DNA responsible for the match through my father's side. Since all DNA inherited through my mother's line must come through her, AncestryDNA has identified the wrong common ancestor as the source of the DNA shared between LGB and me. A fluke of the algorithms...? Perhaps. Let's look at some more of my matches.

Once again, as you can see, the common ancestor identified by AncestryDNA is on my mother's side. A thorough search of my mother's matches shows that, once again, this person is not reported as a match to my mother. From this, we can only reach the conclusion that the DNA responsible for this match comes through my father's side - not my mother's. The common ancestor that I share with "Baerion" must be beyond a brick wall in her family tree or on my paternal side. In general, I have had more success filling in the branches of the maternal side of my family tree than the paternal side, so this is certainly possible.

Just to demonstrate that this isn't an isolated occurence, here is another one:

This match doesn't appear on my mother's match list either! So, out of my ten matches that have shaky leaves attached, three of them apparently have common ancestors wrongly identified as the source of our matching DNA. Do you see the problem here? Does AncestryDNA? If this match were, instead, at 23andMe or Family Tree DNA, I could check the DNA segment that we share and compare it to my other matches and/or my chromosome map. This would provide additional information and/or evidence to help me determine through which of my ancestral lines this segment of DNA was inherited. Might there be other explanations for these discrepancies? It is certainly possible, but without the underlying genetic data, it is impossible to say.

I am in the fortunate position to have tested my mother at AncestryDNA in addition to myself, so I can clearly see there is an issue. What about all of those people who have not tested a parent and are blindly accepting AncestryDNA's shared ancestor hints because they don't know otherwise? Isn't that kind of like copying someone's tree and just taking their word for it that it is correct with no sources or evidence attached? For now, those of us who do understand the finer points of autosomal DNA matching will have to do our best to convince our matches to upload to Gedmatch so they can see for themselves what they are missing.

As much as I, too, am disappointed that AncestryDNA has not yet provided the matching segment data, it is clear to me that the reasons behind this decision are far more complex than what others may claim is an attempt to dumb down the product because Ancestry.com thinks its customers are stupid. From my many conversations with Ken Chahine and others from AncestryDNA over the past year, I have come to appreciate that working within the framework of this 1.6 billion dollar corporation comes with its own set of challenges.

The Future
Tim Sullivan, CEO, has made it clear that Ancestry.com is committed to the DNA business and Ken Chahine has always been upfront with me and come through with his promises. So, I am going to give them the benefit of the doubt. From our very first conversation, I have advocated for the genetic genealogy community and looked out for our best interests and I won't stop doing so. I believe that they will do the right thing for their customers and the genetic genealogy community eventually. It may not happen as quickly as we would all like (yesterday!), but they are not the big bad wolf and I think it does us all a disservice to continually paint their intentions in a negative light. We are in early days yet. Let's give them a break.

Saturday, February 9, 2013

A Visit to Family Tree DNA's State-of-the-Art Lab

The array viewer of FTDNA's Applied Biosystems 3730xl DNA Analyzer - that's DNA!

I have been a bit behind in my postings, but I didn't want to miss the chance to write about the incredible lab tour of Family Tree DNA's awesome facility, which was definitely one of the highlights of the 2012 FTDNA Administrator's Conference for me. I don't know if you all realize it, but Family Tree DNA is the ONLY company in the field that has their very own lab. They process everything from start to finish in their state-of-the-art facility in Houston for the thousands of different DNA tests that they offer. (Thomas Krahn notes that the individual Y-SNP tests alone number over 2900!) Some people might find it comforting to know that when ordering a test through FTDNA, this very trustworthy company is the sole handler of their DNA sample.

Bennett Greenspan, CEO of FTDNA with stored B-Swab Samples

This lab is even capable of processing exome and whole genome sequences. (These are available through Family Tree DNA's sister division DNA DTC, both a part of Gene by Gene). Max Blankfeld proudly told me that almost immediately after the announcement, they were already receiving orders for both tests. This is really exciting if you think about it. The first company to offer genetic genealogy testing is, according to The Genomics Law Report, also the only company currently offering these advanced tests-of-the-future in a "truly direct-to-consumer manner". ("Gene By Gene probably does represent, however, the only commercial company currently offering a whole genome sequence in a truly direct-to-consumer (DTC) manner." DNA DTC: The Return of Direct to Consumer Whole Genome Sequencing, Dan Vorhaus, November 29, 2012)

Okay, on to the lab tour! Tim Janzen has kindly shared both his notes and his photos with my readers. I have added to Tim's notes just a bit, but most of what you see below was written by him. (I lost my notes and, although I did use some of them, the photos from my cell phone aren't as nice as Tim's photos from a "real" camera!)

Tim was smiling like this pretty much the entire tour!

On November 12, 2012, Family Tree DNA graciously allowed approximately 30 attendees from the FTDNA conference to take one-hour tours of their laboratory facilities in Houston, Texas. Bennett Greenspan, the president of FTDNA, primarily led the tours and Max Blankfeld led some supplemental tours (for those of us who showed up late!). The tour participants all donned lab jackets for the tour, making us feel very official (I got to be Thomas Krahn). Bennett explained the functions of a number of very important pieces of DNA equipment as he took us through the lab. (Update - Thomas Krahn has added some very interesting and educational commentary throughout this post. I have added it below or just after what it is referencing. Thanks Thomas!)

"Your Genetic Genealogist" parading as Thomas Krahn
Do you think I fooled anyone?

One of the first pieces of equipment we viewed is used to extract the DNA from samples that are sent to FTDNA. The DNA is extracted through an automated process. Ninety-six samples can be processed at a time. FTDNA can complete 600 extractions per day and has a 98.8% success rate of extracting DNA on the first try.

Another piece of equipment held primers that are used to test for specific short tandem repeat (STR) tests and single nucleotide polymorphism (SNP) tests. (Note from Thomas Krahn: We're taking the DNA from the DNA tubes and transferring them to a reaction plate which contains the primers and the rest of the PCR assay. In a previous step the primers were distributed in the correct wells by automatically picking the correct primer pair so that the customer gets the segment sequenced where the marker is located that he's interested in.) This process is also entirely robotized so that multiple tests can be run at a time. The lab must repeatedly change the plastic covers on the pipettes to ensure that DNA from two different samples is never mixed as part of the testing. (Note from Thomas: We use the covers just as a precaution. In case a droplet is released from the pipetting tips during the movement of the robot we want to make sure that the primer assays are not contaminated with human DNA. With lids the primer library is safe. We would easily recognize such a contamination because of mixed basecalls, however primers are expensive and we want to protect the primer library from such a contamination. The chance that DNA samples mix is pretty small because we reduce the speed of the robot drastically while we have the tips over the DNA plate. In any case such a cross contamination could probably not be prevented with a lid method. The pipetting tip is discarded after the DNA from one person is distributed into all the assays that he ordered. Each DNA sample gets a brand new tip so that cross-contamination through the tip itself can be excluded.)

FTDNA's automated equipment that is used to combine DNA with primers

Bennett Greenspan proudly showed off the robotic DNA storage freezer that the company purchased in autumn of 2011. It was designed and manufactured by the engineering company Matrical Bioscience in Spokane, Washington. This piece of equipment took many months to design and build. After it had been built, it was disassembled and shipped to Houston. The installation process in the FTDNA lab took approximately 6 weeks. This chamber stores multiple small trays that hold 96 DNA samples each. The samples are held in small vials about 3/4 inch in height and about 3/8 inch in diameter. The trays holding the samples are about 4 inches by 7 inches in size. There are thousands of trays stacked on top of each other in a -20 degrees Celsius chamber approximately 5 feet by 8 feet in size. A robot inside this chamber retrieves DNA samples from the approximately 175,000 samples that are stored there in a very strictly regimented and automated fashion. (Note from Thomas: I have just made a query and there are currently 173,012 DNA tubes in the store. However more than 50% of them have been added during the last year. 500,000 is the approximate number when the store is 100% full. We have rigorously sorted out empty and bad tubes from our repository in the past and we run regular compression procedures so that the racks are not half empty.) This robot is wired to a computer station outside the room. At the computer terminal a lab technician can enter a series of kit numbers for DNA samples for which additional testing has been ordered. The technician can then leave the area to do other things while the robot automatically retrieves the samples that were chosen. Ninety-six samples can be retrieved at a time and the retrieval process takes approximately 30 minutes. When the retrieval process has been completed the technician then returns to the storage unit, picks up the tray of samples and takes it to another piece of equipment for the additional testing that the customer has ordered, such as upgraded STR panels or individual SNP tests.

The interior of the MiniStore storage chamber. The robot is retrieving samples.

The functions of other pieces of equipment in the lab were also explained by Bennett. One machine is used for mitochondrial DNA sequencing. There is also another piece of equipment that is used to lyse the cells in each of the DNA samples and prepare them for extraction of the DNA. Additionally, Bennett showed us a room that contains thousands of DNA samples that have not yet been processed. These samples are held in long-term storage at room temperature for eventual use by customers who wish to order additional testing.

Bennett also took us to a different room where the Geno 2.0 SNP chip tests are being processed. Approximately 154,000 SNPs are tested on a single chip for the Geno 2.0 test. Considering this, the chips are relatively small at approximately 1 inch by 4 inches in size. The actual scanning area is only 5mm by 5mm per assay. (Note from Thomas: On one of those glass slides there are 12 samples processed simultaneously.) At the time we were there, thousands of chips were being processed.

Bennett holding one of the Geno 2.0 SNP chips

Bennett showed us the new sequencing machines that were recently purchased so that the company can do large-scale complete genome sequencing. The two new Illumina HiSeq 2000 machines can sequence 10 times as much as the Applied Biosystems DNA Analyzers are capable of using 454 sequencing and can sequence a complete genome in three runs. These machines are approximately 2000 times as efficient as using the primers used in Walk Through the Y testing and are 100 times as efficient as the Applied Biosystems 454 sequencer. (Note from Thomas: The HiSeq instruments are producing more sequences so that you get a higher coverage. If the machines are also more "efficient" for finding new mutations by the same scaling factor is still to be demonstrated. Simply scaling the average coverage of 400 KB from the WTY times 2000 would yield 800 MB, however the Y chromosome is only 50 MB long and only 20 MB can be effectively aligned to the reference sequence. To overcome this problem you'll want to enrich for target specific DNA and you'll want to barcode several samples so that they can be pooled together on one instrument run. This again reduces the "efficiency" by quite some significant percentage.) FTDNA has two new Illumina MiSeq DNA sequencers as well. The lab now has the capability to sequence six whole genomes in two weeks and 64 exomes at 80x coverage in one week.

Bennett discussing the two new Illumina HiSeq 2000 high output DNA sequencers

The FTDNA lab tour was an exciting experience. It was very interesting to see all of the technicians at work running the various DNA tests that we as FTDNA customers have ordered. I was incredibly impressed by their vast array of state-of-the-art equipment. Hopefully, Bennett and the rest of his lab staff will continue to allow these tours for attendees at future FTDNA conferences. If you haven't seen it yet, it is well worth your time and I highly recommend it.

The following may be overkill for some of you. If so, I will just say goodbye to you here. However, if you are like me and you just can't get enough or want to gain a better understanding of how this all works from the inside, here is a series of photos from the lab tour.


FTDNA's Biomek FXP Laboratory Automation Workshop, used for DNA extraction and primer stamping for pre-arrayed assays (such as FMS or WTY)

FTDNA's automated equipment used to prepare PCR assays

Comment from Thomas: FTDNA's automated equipment that is used to combine genomic DNA with primers to prepare a PCR assay. An in-house developed software controlls the pipetting steps so that the ordered assay is run with the correct DNA.

FTDNA DNA lab technicians at work


FTDNA lab tech pipetting a cheek swab sample to prepare for DNA extraction

Comment from Thomas: This is the only step that still needs to be processed manually because we haven't found a robot that is able to remove the cotton swab from the vial yet.

More FTDNA lab techs at work

Tim checking out FTDNA"s MiniStore re-arraying platform inside the DNA freezer

Comment from Thomas: (This is) where the individual DNA sample tubes are re-arranged to the position where they will be processed in the assay. We try to prevent thaw and freeze cycles wherever possible because they will degrade DNA. So instead of taking out a complete plate (where we may only need a single sample) and thaw it completely, we use this technology of re-arraying tubes in the frozen state so that only the processed samples will be thawed and all other samples will stay frozen. The DNA samples/plates are actually stored in the back compartment and on the left side. Those areas are not clearly visible in the image.

The interior of the MiniStore storage chamber holds over 500,000 DNA samples
(seen at the back side)

One of the thousands of plastic trays that FTDNA uses to hold DNA samples

Shaking incubation chamber used to lyse cells so that DNA can be extracted

Comment from Thomas: The recipe for the liquid in the sample vials is optimized for long term storage under a large variety of storage conditions. Essentially the high salt concentration will suck out all water from inside the cheek cells (by osmosis) so that the cells form compact clumps and conserve their ingredients. This makes the sample more resistant to mechanical shearing and the high salt concentration also inhibits growth of micro-organisms that could digest the cells. The downside of this is that we need to use harsh mechanical forces when we enzymatically want to open the cells to extract the DNA. Shaking a 2 ml sample tube in an upright position doesn't really move the liquid in the tube a lot. Therefore we turn the tubes horizontally and shake them along the tube axis where the liquid has a longer path to accelerate. This method has been proven to be very effective.

Biomek FXP Laboratory Automation Workstation

Eppendorf Mastercycler thermal cyclers amplify DNA segments

Linda Magellan with 1 of 3 Applied BioSystems 3730xl DNA Analyzers

Bennett with FTDNA's 2 Illumina MiSeq DNA sequencers in the background

FTDNA technician processing Geno 2.0 SNP chip tests

Close-up of tech processing tests, note the SNP chips in the liquid

Stacks of hundreds of Illumina SNP chips that have been completed

Comment from Thomas: Before we dispose the chips to the glass recycling facility, we temporarily store them for a few weeks. During a 48-hour time frame they are still good and since they are barcoded we may have the chance in rare cases to re-scan a chip in case some bad read happened at the scanner.

Bennett holding a box of 100 unprocessed samples being held in long-term storage

FTDNA's storage room where 1000s of unprocessed samples are held...


...Can you see yours?

Water purification system

Comment from Thomas: In order to get ultrapure water for molecular genetic assays (such as PCR) the tap water needs to pass through several steps:
1.) Pre-filter to remove particles such as sand and rust from pipes
2.) Membrane filters to get rid of small particulate material
3.) Ion exchangers to remove Ions and salts
4.) Reverse osmosis to remove organic compounds. At this stage we use the water for general cleaning purpose such as for the lab dishwasher and for rinsing flasks. However to get PCR grade water we continue with:
5.) Another step of ion exchange and reverse osmosis in a laboratory grade water purification system to remove any contamination coming from the plastic tube piping internal of the lab.
6.) Then the water passes a sterile filter before it is bottled in a glass flask to remove bacteria and other micro-organisms.
7.) Finally the bottled water is heat sterilized in an autoclave at a temperature that should degrade all possible DNA chains that may have survived to this point.

We also have a room with two air compressors that supply compressed air for the robots. One of them is a backup system because at an interruption all robots will stand still. They are in a separate room across the hallway because they're quite noisy and produce a lot of heat.

Bennett with Geno 2.0 SNP chips

Illumina MiSeq machine

Bennett demo-ing the Illumina MiSeq machine

Automated pipetting system

Allied BioSystems 3730xl DNA Analyzer, used for sequencing

Comment from Thomas: Each lane represents a sequencing trace where the bases are displayed in 4 different colors. This display format gives the technician a quick overview about the quality of the complete run. The fluorescence intensities are digitized by a analog to digital converter so that it can be saved on a computer, but they still represent analog measurement values. When talking about a digital output the impression could arise that the display shows the scored base-calls which is really processed in a later step.