Showing posts with label raw data. Show all posts
Showing posts with label raw data. Show all posts

Tuesday, April 18, 2017

The Fourth Pond: MyHeritage DNA

For years we have been advising DNA testers, specifically those searching for birth family and attempting to solve family mysteries, to test at all three of the major DNA testing companies, in other words to “fish in all three ponds.”  These three autosomal DNA databases – AncestryDNA, 23andMe and FamilyTree DNA - now contain between about five to six million testers in total. 

For another company to be able to compete in this space, they must offer a test comparable in resolution and features, and be prepared to tackle the challenging proposition of “catching up” with the databases of the other three companies. That is a tall order and, so far, there have been no other companies to earn our recommendation. With that said, I believe the time has come that we must consider revising our advice to include a fourth “pond,” MyHeritage DNA.


When a genealogist or person of unknown parentage is seeking to answer a specific question about their ancestors, near or far, the chance of success, at least, partially relies on luck. Who else has tested at the same company? For unknown parentage this is especially true. For example, an adoptee may test at only one company, while the birth parent or sibling is tested at another. If the attempt to search goes no further, then there will be no successful outcome. Since all of these databases are proprietary and only a relatively small percentage upload to the third party comparison site Gedmatch, it is essential for those engaged in unresolved searches to make sure the DNA is submitted to all databases where there may be a unique match.

Recently, MyHeritage launched their DNA matching service. For most of us, it may be easy to ignore them for now while they work out their questionable matching algorithms and grow their database to a size that earns our interest, but for those of unknown parentage or for birth parents who have yet to find success in their searches, it may not be prudent to do so. In the last couple of weeks, I have been made aware of several unknown parentage cases that were resolved through MyHeritage DNA. These searchers had made sure that their DNA was “fishing in all three of the ponds,” plus Gedmatch, and yet had not found the answers they were seeking in any of those databases. Since MyHeritage offers a free upload of the raw data files from the other three DNA testing companies, this has encouraged some in my DNA Detectives Facebook group to try it out.

Well, it turns out that MyHeritage is having success at attracting its own unique group of testers who are not at the other three companies. Lo and behold, for some, what they have been looking for is in that database and nowhere else. Since it is, undoubtedly, still the smallest database, the odds of finding a close match are presently low, but they are clearly not zero.

Consider these three recent MyHeritage DNA success stories.

STORY ONE

From Robin:

The father was my first love, high school sweetheart. He was three years older.  We had talked about getting married but something happened…he turned and suddenly didn’t want anything to do with me.  I was devastated and distraught. 

I gave my daughter up through the LDS Social Services in a closed adoption.  At the time they did not do adoptions with pictures or information given to the birth parent after the birth.  I had told my counselor that someday I wanted to meet her.

I had tried everything to try and find her.  I had always thought she was adopted in California.  It wasn’t but about 20 years ago that I found out it was actually in Davis County, Utah.  I had been looking in the wrong place.  I tried to register in the Utah adoption registry, but they wouldn’t let me because the birth had to be in that state.  I tried to register in the Hawaii adoption registry, but they wouldn’t let me because the adoption had to be there.  The birth was in Hawaii and adoption in Utah -- just opposite from their rules.  I tried to send for the amended birth certificate hoping someone would screw-up and send it to me.  I got the original one….  I tried talking with people in Hawaii.  I tried talking with people in Utah.  I tried writing the court to tell them I had cancer in 2003 and it was imperative that I get a hold of her to let her know the medical history.  They never wrote me back… I even had a friend attorney try to find a loophole the in Hawaii law code that would permit me to have the records open. No luck, nothing.  I was pretty discouraged.  My mother passed in 2001 and I had always wanted her to meet my daughter but it didn’t happen.  I even would say, ”Mom, I know you know who she is now and please just whisper her name in my ear.”  If I had a name I knew the chances of finding her were pretty good.

My husband and I also wanted to do our DNA even though we had a fairly good idea of our roots and where we were from. For Christmas 2016 we decided to both do our DNA through AncestryDNA.  My friend Jennifer was helping me… and in the process I told her my story about having a child at 15 and giving her up for adoption.  She said, “You have to meet my sister-in-law!!” Her sister-in-law Mckell, came over to my house and told me how she helps people find people.  She told me that I have to upload my DNA data with other sites.  I was a little skeptical at first because that was really putting myself out there, but, oh well, the government knows everything about us anyway, what the heck!  She had me go on this site and that site and to MyHeritage. This was in January 2017.  I was grateful to her but really didn’t think about it much after that day.  Every now and then I would get an email from the sites saying they found my 14th cousin….ok, that’s an exaggeration but you get what I mean.  No big deal, right. 

So on Sunday April 2nd I had received a notice on my phone that I had an email from MyHeritage.  Oh another one of those….  I hadn’t been feeling good so I pretty much lay around, watched TV all day.  That evening I got ready for bed and decided to look at my emails.  It was about 10pm.  I pulled the email up and started to read...

Hi Robin,
Good news! We’ve discovered new DNA Matches for you.
(OK another one….)
Your top new DNA Matches
Becky
Age 40's
From USA
49.1% shared DNA suggest the following possible relationship:
Daughter
(What the heck…)
It took my breath away. 


Robin's MyHeritage Match

I quickly called Jennifer, she didn’t answer so I texted her: "MyHeritage…..Daughter….call me ASAP!" She called Mckell and Mckell called me all calm like.  I told her and she said,  "Robin, that is HER!"  I kept questioning because I just couldn’t believe it. The next two hours Mckell and I were on the phone trying to find out everything I could about Becky.  I still couldn’t find her birthday.  That was the one piece that would cinch this whole puzzle for me to really know if it was her.

At 7:40 am I sent Becky a private message to her Facebook page,
"Hi Becky my name is Robin … and I live in Mesa, AZ.  My Heritage DNA messaged me yesterday and if you are who I think you are, I have been looking for you practically my whole life.  When is your birthday? Please call me 480 -…"

I went to work and stewed all day.  I couldn’t focus and I tried to keep myself busy.  Finally at 2:33pm I got a response:
Hi Robin! What a surprise! Can you tell me the birthdate of the person you think I am? (Winky face)

Me: Yes I gave birth to a daughter January 10, 19xx [removed for privacy] in Queens Hospital in Honolulu Hawaii. I was 15 yrs old.

Her: (Big smiley face) OMGoodness!!! WOW!!! Yes, it’s me (cheezy grin) Forgive me, I’m kind of in shock.  Can we text for a bit before we talk? 

Me: Yes, I found out last night about 10pm.  I have a friend that made me sign up in MyHeritage…I was up till 1am,  got up this am at 6. Had to take a sleeping pill I was so excited….I’m at work but its ok.  Whenever you are ready…I’ve waited this long :)

We continued to talk back and forth until she had to go get ready for work.  I told her we have seven children and that she has five sisters and two brothers.  She was blown away, but in a good way.  She was so excited to have sisters.  She always wanted a big family.  I told her we have 30 in our family -- and that is just my husband and I, our kids, their spouses, and grandchildren. 

Becky had done her DNA through MyHeritage to find out her roots….she got a lot more than she bargained for.

So much more happened…. Then we met….that’s another story….

Robin and her biological daughter Becky meeting for the first time

Robin's daughter Becky had only tested at one DNA company.  

Unlike a person of unknown parentage searching for their birth parents, when a birth parent is searching for their biological child, it is like searching for a needle in a haystack. This is because that one person (or their descendants) has to have also taken a DNA test. Very importantly, they must be in the same database. In this case, if Robin had only submitted her DNA to one, two or three of the DNA testing companies, and if Mckell had not encouraged Robin to upload to MyHeritage, she would not be reunited with her daughter today. 


STORY TWO


Nancy used MyHeritage in her search for her mother's birth parents

From Nancy:

Well thanks to you and a 20/20 piece you did, I took my first DNA test with AncestryDNA last year. (My husband did as well and found his birth father!) I was trying to uncover my mother’s true origins. The story I had heard was that my mom's birthmother had my mom and went away with her. She then came back to the birth father’s house and dropped her off to never be seen again. In the end, my mom was raised by neither birth parent and ended up being adopted by someone else.

I took all available tests out there and transferred my raw DNA data to all sites that were free. My best match was a 4th cousin on AncestryDNA.

About two weeks ago I get an email from MyHeritage about a match with 870.8 cM shared and, at the same time, I got a match on AncestryDNA with 355 cM shared.  The MyHeritage match turned out to be my half-aunt on my maternal grandmother’s side and the AncestryDNA match was my half-first cousin on my maternal grandfather’s side, so each match identified one of my mother’s birth parents! 

My aunt told me that my mom’s birth father and grandmother came and took the baby from her and told her to stay away! She said the family knew about my mom and they would celebrate her birthday and keep her memory alive in the family. Tragically, according to my aunt, my maternal grandmother died heartbroken over losing her daughter.


Nancy's mom and her birthparents

Bittersweet discovery, but finally some answers. 

It turns out that while Nancy's DNA was in all the databases, her aunt had ONLY tested at MyHeritage. 

She explained, "I was bored one day from surgery and I started looking into it. I bought my kit, sent in my DNA, and the rest is history! I have never done this before. I am so glad I did -- LOOK. We found each other, Mija. I am so happy." 

She further explained her main goal in testing was to confirm Native American ancestry. It is difficult to predict why a person might test in one database and not another – even the smaller ones.


STORY THREE


Get your tissues out for this beautiful story of sisters, Morgan and Jennisara, finding each other.


  

---------

These recent success stories have convinced me that for those searching for close biological family members, if all else fails, it is time to give MyHeritage a try. Thanks to the company's offer of free raw data uploads it will cost nothing to do so and the small effort may pay off handsomely. You just never know who it sitting in that database waiting to be matched to you, or who will test there next week or next year. It is worth the effort to make sure we are covering all the bases. 

If you are looking to break down more distant genealogical brick walls, you may also want to consider uploading your raw data while it is still free. Please note, however, that at this time there seem to be issues with the matching algorithms, so I would approach the matches with caution. (Of course, any close family matches like the ones in the story above should be very reliable due to the ease of detecting/predicting these.) For those researching more recent European roots, I believe MyHeritage DNA will continue to grow in importance due to their appeal to testers outside of the United States. 


You can upload here.


Best of luck with your searches/research and I would love to hear about any more MyHeritage DNA success stories in the comments below.


[Edited to add: MyHeritage has offered my readers a free 14-day trial for their genealogical records Complete Plan, plus over 50% off for the year for those who continue after the trial period. This offer is good for new customers only. The trial can be be found here.]

Thursday, June 13, 2013

A Sneak Peek at the UPDATED AncestryDNA Search Filter

AncestryDNA has been hard at work perfecting the new search filter these last few weeks. Not surprisingly, it has undergone some changes since I last shared it with my readers.

Stephen Baloglu, Ancestry.com's Director of Product Marketing, described what appears to be the version that will go live in the next few weeks:

The search includes looking within your DNA matches for surnames and birth locations in their family tree that they have linked to their DNA results. It does not currently include searching for username, but we may update that in the future. The top request was surnames/birth locations, so we started there. Also, we expect people to more likely know surnames they're looking for much more often than ancestry username.

He also shared these screen shots and accompanying descriptions with me on June 6th. Click to enlarge them for a closer look.








 






























In a YouTube video on Ancestry.com's account (dated June 3, 2013), this slide is shown describing the AncestryDNA search filters.

Notice that the username search was, apparently, still expected at that time. Ancestry.com employees Anna Swayne and Crista Cowan discuss it on the video from about 15:30 to 16:00.  Crista clarifies that the username filter will be "an ability to search by the name of somebody who is a DNA match" and Anna says that it "will be coming soon". However when I asked Ken Chahine, General Manager of AncestryDNA, about the username filter at the recent DNA conference on June 6th, he explained that Ancestry.com does not currently have the option to search by username built into their system, which is limiting AncestryDNA's ability to provide it. [Update: I have asked for clarification on this point.]

Although this additional filter would be convenient, in my opinion, it isn't essential. For example,  I would like to search my mother's account to determine which of my matches with a surname of interest also appear in her account, but I can do so almost as easily by searching on that surname. Further, to search your account to see if your research partners appear as matches, you can also filter by your shared surname(s). I would really like to see an "In Common With" filter to determine which of my matches are from my mother's side, but, fortunately, I can already use Jeff Snavely's tool for that purpose.

With these upcoming filters, Jeff's terrific tool and Family Tree DNA's recent announcement that they are accepting raw data uploads from AncestryDNA (for only $49!), this test is becoming much more useful for genetic genealogists. (Now, if only we could get that chromosome browser onsite!)

Stephen tells me that there still isn't a confirmed date for the new search filter's arrival, but I expect it to appear sometime this summer.

Monday, March 25, 2013

AncestryDNA, Raw Data and RootsTech

Tim Janzen and I discussing the AncestryDNA features at RootsTech with AncestryDNA staff

Since RootsTech there has been lots of discussion regarding the features that AncestryDNA is and is not planning to offer their customers. I will address the many questions that I have received about the meetings in which I participated at the show, but first let's review:

Raw Data Downloads
On Thursday, AncestryDNA fulfilled their promise to allow customers to download their raw data. As Dr. Ken Chahine had assured me back in November, the file is not encrypted and is compatible with third party tools.

I sent my file to a number of third party providers:
  1. After working with it a bit, John Olson announced on the site that he expects that Gedmatch will be accepting AncestryDNA uploads in about two weeks. 
  2. David Pike told me that he has updated his tools to work with the AncestryDNA files.
  3. Leon Kull has reportedly updated his HIR search site to work with them as well.  
  4. Dr. Ann Turner has created an Excel macro to convert the AncestryDNA files to 23andMe format. 
At the "Ask the Expert" Genetic Genealogy panel that I moderated at RootsTech on Saturday:
  1. Bennett Greenspan told the audience that Family Tree DNA will be accepting AncestryDNA transfers into Family Finder starting on May 1st.
  2. Dr. Catherine Ball confirmed that the raw data file is not phased and that they are delivering it as they receive it from the chip manufacturer Illumina.  She also confirmed what Dr. Ann Turner had already discovered - the data labeled as "Chromosome 25" is from the PAR region. Further, the "Chromosome 23" label refers to the X chromosome data and "Chromosome 24" refers to the Y chromosome.
Additional notes:
  1. Unlike Family Tree DNA, AncestryDNA is not removing any SNPs from the data - medically relevant or not. 
  2. The overlap between AncestryDNA's raw data file and 23andMe's should be around 690,000  SNPs due to the fact that they are both using the same Illumina OmniExpress Plus base chip. The ~10,000 SNP difference can be accounted for due to a different set of poorly preforming probes and test SNPs. Family Tree DNA's should have a similar overlap for the same reasons.
  3. There is no mitochondrial DNA included in the raw data file because it is not included on the Illumina chip that they are using. (23andMe adds the mtDNA SNPs).

Search Function
As I expected from earlier conversations with AncestryDNA, a search function is next on the list. Kenny Freestone, Product Manager for AncestryDNA, discussed it in his presentation under the heading "What's Next". Although it is already in the works, Kenny could not provide a firm timeline for its availability when I asked.

We will be able to filter our list of matches by surname, location and username. As anyone who has worked with their AncestryDNA matches knows, this is sorely needed. There is no doubt that the many requests from customers pushed this up their list of priorities.

Genetic Ethnicity Update
Later this year, AncestryDNA will be updating their Genetic Ethnicity feature. They will provide more granularity in Europe and West Africa. We can also expect more accurate breakdowns. A number of AncestryDNA personnel acknowledged to me over the weekend that certain "ethnicities" (i.e. - Scandinavian) are overestimated for many customers. However, they also emphasized that much of the perceived problem with their admixture analysis stems from the question of "where and when". What they mean by this is that it is very difficult (and sometimes impossible) to pinpoint where specific DNA signatures were at an exact time in history.

As I always remind my readers, this portion of the science has a long way to go and will improve with more data and time. On the "more data" front, during her speech at the AncestryDNA luncheon on Friday, Dr. Ball was reportedly requesting that genealogists who know that all eight of their great grandparents were born in the same place share this information with AncestryDNA. This seems to imply that, like 23andMe has successfully done, AncestryDNA plans to use customer data to improve their predictions. They are also starting to work on incorporating the coveted SMGF collection into their admixture analysis, which should improve it greatly.

The good news is that AncestryDNA customers don't have to wait for this update to gain more insight into their ancestral origins. Now that AncestryDNA has made the raw data available, customers will be able to upload their raw data file to the various third party sites to try out the admixture calculators and/or send it to Dr. McDonald for his very highly regarded analysis.

Matching
AncestryDNA are currently working on an algorithm to improve matching for endogamous populations, specifically Ashkenazi Jews.

As I reported in November, the minimum threshold for matching is 5 megabase pairs. This was reconfirmed in a conversation I had with Dr. Ball on Friday. I also learned that there is no minimum SNP requirement. We discussed the possibility of AncestryDNA switching to centiMorgan measurements in the future.

Price
The test is now $99 for everyone - subscribers and non-subscribers. This was likely in response to 23andMe's recent price drop. Having attracted well over 120,000 customers in less than a year in business, AncestryDNA is proving to be an important player in this field. This new policy to attract subscribers and non-subscribers alike will only improve their market share.

International Customers
It does not appear that AncestryDNA has plans to offer their test to international customers in the near future, instead choosing to focus on the U.S. market for now.

Matching Segment Data and Chromosome Browser
On Friday at RootsTech, Dr. Tim Janzen and I sat down for a meeting with AncestryDNA management. Among others, we were joined by Dr. Ken Chahine, Senior Vice President and General Manager of DNA, and Dr. Catherine Ball, Vice President of Genomics and BioInformatics. (Dave Dowell also attended a portion of the meeting.)  I found them to be very receptive to hearing our requests and the reasons behind them. At no time did they state that they had decided not to build a chromosome browser or release matching segment data to their customers in the future. Dr. Ball did express some privacy concerns, but was open to hearing ideas of how this could be addressed.

Tim Janzen explains his feelings while Ken Chahine looks on

During the meeting, Tim very emphatically explained his feelings on the need for matching segment data (above) and I resorted to begging (below)... {hehe}

Catherine Ball, Ken Chahine, Tim Janzen, me, Dave Dowell and Steve Baloglu

On Saturday, after attending Kenny Freestone's presentation, four advanced genetic genealogists approached him to discuss the chromosome browser issue. In addition to myself, Tim Janzen, Angie Bush, and Nathan Machula were present for the conversation. Kenny didn't have much to say and mostly listened to the arguments that we presented covering why we feel that it is essential that AncestryDNA offer the matching segment data behind their relative predictions. At no time did he state that AncestryDNA would not offer a chromosome browser or that the delay in doing so was because AncestryDNA didn't think that their customers could understand it.  He did, however, confirm that it was not a top priority at this time. He also said that he personally reads all of the requests sent through the feedback button, so if you want them to reassess their priorities, then be sure and let them know.

Tim emphasized that both 23andMe and Family Tree DNA included a chromosome browser feature at the launch of their autosomal DNA product and wondered aloud why AncestryDNA had not done so as well.  I explained to Kenny (as well as in my meeting with management) that, as genealogists, we expect conclusions to be evidence based. It is not in line with this principle to simply be told that a certain common ancestor is responsible for a DNA match and be expected to take AncestryDNA's word for it. Where is the proof?  Since Kenny had shown a chart during his presentation of his ancestral lines that he claimed were genetically confirmed by AncestryDNA matches, I also pointed out the fact that those lines that he had shaded in weren't really confirmed without the actual genetic data to support that claim. To illustrate, I laid out my experience as follows:

On my AncestryDNA account, I was happy to find a shaky leaf hint a few weeks ago.




Upon reviewing the match, I noted that the common ancestor was through my mother's side. I was initially excited to see that I had inherited DNA from my 7th great grandparents on paper, Joseph Denison and Prudence Miner.


The only problem is that this match doesn't appear anywhere on my mother's 47 pages of matches. Do you know what this means? It means that I must have inherited the DNA responsible for the match through my father's side. Since all DNA inherited through my mother's line must come through her, AncestryDNA has identified the wrong common ancestor as the source of the DNA shared between LGB and me. A fluke of the algorithms...? Perhaps. Let's look at some more of my matches.






Once again, as you can see, the common ancestor identified by AncestryDNA is on my mother's side. A thorough search of my mother's matches shows that, once again, this person is not reported as a match to my mother. From this, we can only reach the conclusion that the DNA responsible for this match comes through my father's side - not my mother's. The common ancestor that I share with "Baerion" must be beyond a brick wall in her family tree or on my paternal side. In general, I have had more success filling in the branches of the maternal side of my family tree than the paternal side, so this is certainly possible.

Just to demonstrate that this isn't an isolated occurence, here is another one:






This match doesn't appear on my mother's match list either! So, out of my ten matches that have shaky leaves attached,  three of them apparently have common ancestors wrongly identified as the source of our matching DNA. Do you see the problem here? Does AncestryDNA?  If this match were, instead, at 23andMe or Family Tree DNA, I could check the DNA segment that we share and compare it to my other matches and/or my chromosome map. This would provide additional information and/or evidence to help me determine through which of my ancestral lines this segment of DNA was inherited. Might there be other explanations for these discrepancies? It is certainly possible, but without the underlying genetic data, it is impossible to say.

I am in the fortunate position to have tested my mother at AncestryDNA in addition to myself, so I can clearly see there is an issue. What about all of those people who have not tested a parent and are blindly accepting AncestryDNA's shared ancestor hints because they don't know otherwise? Isn't that kind of like copying someone's tree and just taking their word for it that it is correct with no sources or evidence attached? For now, those of us who do understand the finer points of autosomal DNA matching will have to do our best to convince our matches to upload to Gedmatch so they can see for themselves what they are missing.

As much as I, too, am disappointed that AncestryDNA has not yet provided the matching segment data, it is clear to me that the reasons behind this decision are far more complex than what others may claim is an attempt to dumb down the product because Ancestry.com thinks its customers are stupid. From my many conversations with Ken Chahine and others from AncestryDNA over the past year, I have come to appreciate that working within the framework of this 1.6 billion dollar corporation comes with its own set of challenges.

The Future
Tim Sullivan, CEO, has made it clear that Ancestry.com is committed to the DNA business and Ken Chahine has always been upfront with me and come through with his promises. So, I am going to give them the benefit of the doubt. From our very first conversation, I have advocated for the genetic genealogy community and looked out for our best interests and I won't stop doing so. I believe that they will do the right thing for their customers and the genetic genealogy community eventually. It may not happen as quickly as we would all like (yesterday!), but they are not the big bad wolf and I think it does us all a disservice to continually paint their intentions in a negative light. We are in early days yet. Let's give them a break.

Friday, November 30, 2012

More on Geno 2.0: Third Party Resources and Images from a Second Set of Results

It turns out that an additional correspondent of mine received her National Geographic Geno 2.0 results in the last batch and she has kindly given me permission to share them. I have also received word that several third party tools are ready to accept GenoChip (Geno 2.0) raw data.

THIRD PARTY TOOLS
Mitochondrial DNA researcher James Lick's mthap tool (mtDNA haplogroup predictor) is now capable of processing the new GenoChip raw data. In fact, his tool returns an even more detailed subclade for the K1a1b1 result (shown below) than Geno 2.0 currently reports (K1a1b1f). This was also true for a K1 sample that was extended to K1e with Jim's tool.

Happily, reversing his earlier statements, Dr. Doug McDonald has successfully adapted his admixture analysis program to be able to work with the Geno 2.0 files. However, as expected, he reported that the results are inferior when compared to using 23andMe and Family Finder files due to the lower number of SNPs that he is able to incorporate in his program.

Dienekes announced today that he has created a converter to run Geno 2.0 files on his DIYDodecad tool.

After working with a raw data file, Mike Cariaso, cofounder of SNPedia, tells me that versions of Promethease 0.1.149 and later will be able to read the Geno 2.0 files directly.

Leon Kull reports that he is accepting Geno 2.0 files for inclusion in his HIR Search autosomal DNA matching database.

Y-DNA researcher David Reynolds compiled a list of Y-SNPs on the GenoChip from the raw data file. He explains on the ISOGG Facebook page, "While we don't know the exact location yet of most of the 12,000+ (e.g., the CTS, F, PF SNPs), this will answer the questions about which DF/L/Z etc SNPs are included."

Dr. Tim Janzen, 23andMe Ancestry Ambassador and ISOGG Y-SNP Tree Committee Member, is currently working on creating a file that will include the SNP positions for all of the SNPs on the GenoChip that are also found on the 23andMe v3 chip. I will add the link here when it is completed (probably tomorrow).

Dr. Ann Turner (we all know who she is and needs no introduction from me!) has informed me that she is ready to share the spreadsheet method that she has been working on to facilitate Geno 2.0 raw data usage with third party applications. She tells me, "GenoConvertTemplate.xlsm contains a macro to convert the GenoChip raw data download to the format used by 23andMe, which many 3rd party utilities can handle. It uses Build 37 numbers for the chromosomal locations of the SNPs." She would like to test it with additional Geno 2.0 data files, especially from people who have also tested at 23andMe and are willing to share both files with her. Please contact Ann directly for access or to donate your raw data (you can write to me for her email address if you don't have it).

Mitochondrial DNA researcher and mtDNA Haplogroup K Project administrator  Bill Hurst has spent quite a bit of time examining the data as well. After working with two files, he commented (in agreement with Jim Lick) that so far he "was impressed by the full coverage of the test, even while testing only 19% of the mtDNA."

I had full confidence that our amazing community of researchers/citizen scientists would come through for us and they have, even more quickly than imagined! Thank you to all those who have contributed so far to this groundbreaking project. (There's a long way to go yet!) I would be remiss if I didn't note that this expedient work could not have happened without the generosity of both Sharon and Anna in sharing their early data. Thanks go out to both of them from all of the community!

The researchers listed above are still looking to examine more raw data files, so if anyone reading this has received their results and would like to contribute to the cause, please contact me or any of the researchers directly. If you have a third party tool that is accepting Geno 2.0 raw data and is not listed, please comment below or email me to be added. I will continue to update this list as I receive more info, so please check back periodically.

**To access your Geno 2.0 raw data for use with these tools, go to "Expert Options" under the Profile tab and download the .CSV file to your desktop.**


MORE RESULTS FROM GENO 2.0:

YOUR MATERNAL LINE
Here are the newest set of mtDNA results:























WHO AM I?
These are her autosomal results:

Note: Clicking on a percentage brings up information about that population
National Geographic has recently clarified that the matching reference populations listed as first and second (shown in this screen shot) are not in order of closest matching, but simply the top two.

Interestingly, her Denisovan ancestry component is significantly lower than the other account that I reviewed.



OUR STORY
It looks like there are a few more "stories" in the community section now:


There are a total of 30 as of today (more than when I looked yesterday!).


TRANSFERS
I tried to transfer her data to Family Tree DNA (found in Expert Options under the "Download Data" button),


but I got this:

"The kit you entered already has a Population Finder, new PopFinder data cannot be transferred"

It appears that if you already have a Family Finder test from FTDNA (which includes the Population Finder feature), then you cannot transfer your Geno 2.0 results at this time. I'm surprised by this since Family Finder (PopFinder) and Geno 2.0 are very different products. Further, this customer does not have a mtDNA test in her FTDNA account, so she should be able to transfer her mtDNA results at the very least. One project administrator has already reported that a woman has transferred her results into an FTDNA Project, so contrary to earlier reports, women are not automatically blocked from transferring their Geno 2.0 results to FTDNA. I'm quite confident that National Geographic or FTDNA will clarify this issue for us soon.

MORE TO COME...
I'm sure this will be only one of my many future posts on Geno 2.0, so stay tuned.

Monday, November 5, 2012

Ken Chahine Answers My Questions and Reveals Behind-the-Scenes Information about AncestryDNA

I recently had a long and enlightening conversation with Dr. Ken Chahine, General Manager of AncestryDNA. Due to the lack of information from reliable sources regarding the details of what is going on behind the scenes at AncestryDNA, there has been considerable speculation among those interested in the product. As it turns out, much of this has been incorrect.

AncestryDNA's Matching Threshold
First, mostly due to the large number of matches, it has been widely speculated that AncestryDNA is allowing for a much lower threshold than either 23andMe or Family Tree DNA, reporting matches based on as little as two cM. In reality, Ken tells me that AncestryDNA has been using a 5 Mb cutoff [Mb = mega base pairs = 1,000,000 base pairs] for reporting matches in their lowest category - "very low confidence". He explains how they came to this decision and what AncestryDNA sees as the benefits to their customers:

AncestryDNA, we believe, is the only service that phases the genotyping data and has validated the matching algorithm with large pedigrees.  That leads to two important differences.  First, it allowed us to test various segment cutoffs from 5-10 Mb* with and without a proprietary filter that preferentially removes incorrect matches. We've initially selected the 5 Mb cutoff with the filter as providing the best balance between false negative (true matches that we fail to call a match) and false positives (false matches that we call true matches). Second, it allowed us to make a better cousinship prediction.  For example, our data suggest that most relationships that are theoretically predicted to be third cousins are really fourth cousins or deeper.  Therefore, a fourth cousin match at AncestryDNA, we believe, is a third cousin match at other services.

Ken's assertion that AncestryDNA is using a more conservative prediction calculation does appear to be in agreement with what I and many of my colleagues have observed. Time will tell if it is indeed more accurate. The filter aimed at reducing the number of IBS (Identical By State) matches sounds like a promising addition. When we have the ability to examine the raw data we should be able to reach conclusions about how effective the filter is at fulfilling its purpose.

Mb vs cM - What does it mean to us?
As you may know, the centimorgan rather than mega base pairs is used by Family Tree DNA's Family Finder and also primarily by 23andMe as the length of measurement for matching autosomal DNA segments. So, how does this 5 Mb threshold compare to the 5.5 cM* threshold used by Family Tree DNA (*edited from 7.7 cM after I was sent this) and the 7 cM threshold used by 23andMe in their Relative Finder feature? The National Institutes of Health website tells us that in human genetics, "one centimorgan is equivalent, on average, to one million base pairs" or 1 Mb.  Genome.gov agrees, "Generally, one centimorgan equals about 1 million base pairs." However, in reviewing my Ancestry Finder download at 23andMe, which lists the length of segments both in Mb and in cM, I came to the conclusion that, unfortunately, it isn't that simple - at least for our purposes. In some cases, the numerical value in Mb was larger than in cM for the same segment, but in other cases it was smaller. I copied portions of my Ancestry Finder download to demonstrate examples. If you have tested at 23andMe, take a look at your own file to get a feel for the comparison.

The first chart shows the respective values when the Mb value was 11 and the second when the cM value was 11:

                 

Notice the wide range of values in both directions. It appears to be impossible to find a direct correlation between segment lengths measured in Mb and cM. In some cases, the two values weren't even close:

            
The reason the number of base pairs that a centimorgan corresponds to varies so widely is because when the distance along a chromosome is measured in mega base pairs (Mb), the value strictly reflects how many millions of base pairs there are in a matching segment, but when using centimorgans (cM) to express the distance along the chromosome, the frequency or chance of recombination expected within that segment is being measured. Some portions of the genome are expected to recombine more often than others, therefore sometimes a segment of 1 Mb has a relatively good chance of remaining intact and sometimes it does not.

Different Predictions and More Matches
This difference between AncestryDNA's way of calculating the length of segments and that of the other two companies may explain, in part, the reason that some of us are seeing the same matches at AncestryDNA as we have at the other two companies, with very different predictions. The fact that AncestryDNA is using a phasing engine before running the matching algorithms will also account for some of the reported discrepancies. When asked why AncestryDNA is, on average, returning more matches than the other two companies, Ken offered one possible explanation. He said that it may be a result of the AncestryDNA database containing primarily customers with deep roots in the United States and, in many cases, descending from large Colonial New England families.

Adding International Customers to the Database
This discussion prompted me to inquire as to when AncestryDNA plans to offer the test to international customers. Ken said that it is certainly "on the radar", but they do not have an estimate of when this will happen yet. He explained some of the reasons for this:

1. Demand is still high within the United States and they are "processing samples as fast as we can right now".
2. The privacy laws in Europe are, in some cases, different than the US. Therefore, this will take additional time to address.
3. They will need to work out logistical issues.

He emphasized that Ancestry.com is a large company, which necessitates significant forethought and planning before taking action.

Uploadable Raw Data
During our conversation, Ken also addressed the questions surrounding the format the raw data will be presented in as well as the much-hoped-for matching segment data.

"We will be providing raw data download in early 2013.  We have not made any formal decision on segment data.  We understand that it is important to some of our customers and are taking it into serious consideration." 

When I asked him whether the raw data would be formatted in such a way that will be compatible with uploading to third party sites such as Gedmatch.com, he assured me that it would. Fortunately, this puts to rest all of our speculation that the "related security enhancements" Ken referred to in his keynote address at the Consumer Genetics Conference last month would interfere with the data's usability. When I inquired further about the future availability of segment data, he said that he cannot promise anything in that regard, but was open to discussing what presentation formats of that data might be acceptable to genetic genealogists.

Admixture and Reasonable Assumptions
Some have also interpreted Ken's statement (reported by Esquire) that some customers are using their own knowledge to make reasonable assumptions that are leading to incorrect conclusions, to mean that AncestryDNA is using an altogether different method of determining our matches than the other two companies offering autosomal DNA tests for genealogy.  He explained to me that what he was actually referring to was that many customers are assuming that because autosomal DNA matching is only applicable to relatively recent ancestry, that the admixture results also reflect that time period and should match what we know of our ancestral origins from our family trees. He emphasized that AncestryDNA's "Genetic Ethnicity" feature (like any admixture tool) is not looking at the large segments used for relative matching, but rather is examining much smaller blocks and single markers that are ancestrally informative. Therefore, some of this admixture is very old - offering a glimpse much further back in time than our known family trees. He offered reassurance to those who feel that this portion of the test is not yet as accurate as it should be (me included):

AncestryDNA is data-driven.  Our team of scientists are constantly analyzing the data looking for ways to improve the ethnicity and matching prediction algorithms. The science, and hence the customer experience, is only going to improve with time.

At some point, the Sorenson data will likely be incorporated into the AncestryDNA test, which should improve the admixture predictions tremendously.

Even the CEO is working with his matches!
In closing, it was very nice to hear that "everyone from the CEO down is working with their matches" at Ancestry.com. This should lead to a management team that is educated about what we as genetic genealogists are trying to accomplish and how best to do it. As a result, I look forward to improved tools and results at AncestryDNA in the future.

I want to thank Ken Chahine and Stephen Baloglu for their recent efforts to shed some light on aspects of the AncestryDNA test and clear up some of the confusion. As I told both of them, the more transparency that AncestryDNA can offer to the genetic genealogy community, the more satisfied we will all be with the product. According to Ken, it is likely that more details will be revealed soon. That is a very good thing because as I was writing this, I thought of many more questions for him!

Thursday, October 25, 2012

AncestryDNA Launch and Other Related News


AncestryDNA's Wide Release
AncestryDNA is now available to all without an invitation. The new price is $129 for subscribers.


For non-subscribers the price is $199, however if you order a package deal with the 6 month Ancestry.com US Discovery membership, the price drops to $189. The package deal with 6 months of the Ancestry.com World Explorer membership is $249. Obviously, this is a loss leader with the intent of acquiring and retaining subscribers. (You can order here.)





There is a chart provided to compare the options. I have reproduced the relevant portion below:












Based on the information provided in the chart, it appears that even non-subscribers will receive new matches and be able to contact them, however it is unclear if they will be able to access their matches' trees (see line 4 in the graphic). I called the AncestryDNA customer service, but the rep wasn't sure of the answer, although he said that he thought they would be accessible since "that is an important part of the service". [*10/26/12 Update - I spoke with another customer service rep today named Jeremy. He told me that "Connect with your DNA matches" from the chart above does NOT mean that you will be able to contact them unless they contact you first. It only means that you will be able to see the match and review their family tree.  So, non-subscribers WILL be able to see their matches' family trees, but they will NOT be able to initiate contact with them. 10/31/12 Update - The information contained in the last update was inaccurate. Please see the official clarification here.]

In other AncestryDNA News... 
Earlier this month, Dr. Ken Chahine announced a significant addition to the AncestryDNA service in his keynote presentation at the Consumer Genetics Conference in Boston. Crista Cowan reported the news from the conference on the Ancestry.com blog:

AncestryDNA believes that our customers have the right to their own genetic data. It is your DNA, after all. So we’re working to provide access to your raw DNA data in early 2013, which includes related security enhancements to ensure its safety during every step of the process. Moving forward, we plan to add even more tools and improvements for our customers, and any new features will be available to all AncestryDNA members.

I'm very glad to hear that AncestryDNA is listening to its customers (well, the really vocal ones at least). Back in March when I was first introduced to the product, I cited the lack of raw data as one of the major drawbacks to their offering and have continued to beat (and beat) on that drum since then. So, in addition, to the other bloggers and customers who have repeatedly asked for this feature, I feel like this is a victory for genetic genealogy. (Now, we just need to keep pushing for the matching segment data!) I was concerned that the raw data would not be released in a downloadable format due to the wording in the announcement "access to your raw DNA data...which includes security related enhancements", especially after reading Dr. Chahine's comments to the Presidential Commission for the Study of Bioethical Issues in Washington D.C in August. So, I was happy to be informed by Stephen Baloglu, Ancestry.com Director of Product Marketing, during a recent AncestryDNA webinar in which I participated, that the raw data would be in a "downloadable format". Hopefully, that format will also be uploadable to third party sites like Gedmatch.com so, in the absence of AncestryDNA providing it, we can access the matching segment data that genetic genealogists require for their research.

Stephen Baloglu also provided some insight into AncestryDNA's matching system when he responded to a inquiry from Shannon Christmas, "We use total amount of DNA shared and contiguous shared length of segments to calculate how closely you match someone else." I know this doesn't seem like earth-shattering news since it sounds pretty much like what their competitors are doing, but with so little of the scientific method behind AncestryDNA's algorithms public knowledge, every little bit is noteworthy to some of us.

Crista also reports:
We plan to continue to be a part of the genetics landscape moving forward and will be at more events, including the American Society of Human Genetics in San Francisco, where we will present some amazing new discoveries that our scientists have been working on.

I will be keeping my eyes and ears open to hear what that might be!

And, last but not least, I'm sure everyone has already heard about Ancestry.com's impending sale to Permira and might be wondering what this will mean for the DNA portion of the business. I am told that there should not be any major changes and we can see from the press release that DNA appears to be an area they expect to expand in the future, "Ancestry.com's focus will continue to be on investing in content, technology and its user experience, expanding its product offerings in areas like DNA, and building the Ancestry.com brand and the family history category, all on a global basis." Although according to the website, the "AncestryDNA test is not yet available for purchase outside of the United States", this sounds like they intend to promote the DNA tests beyond the United States in the future which would be a positive development for those seeking their genetic connections overseas. 

As always, I will continue to follow the developments of all genetic genealogy related news and make sure my readers are updated. 

[While I was writing this post, I glanced at my new AncestryDNA matches and found a confirmed 5th cousin on my Roderick/Long line with a predicted range of 4th -6th cousins and a 96% confidence level. Yay! I sure do wish I could see our matching segments though! In that vein, Tim Janzen reports that he called today and asked if Ancestry.com would be releasing the matching segment data as well, "The representative said that Ancestry.com might not do that, but that they might  create an 'opt in' option that would allow people to share the matching segment data if they are interested in doing so."]