Monday, March 25, 2013

AncestryDNA, Raw Data and RootsTech

Tim Janzen and I discussing the AncestryDNA features at RootsTech with AncestryDNA staff

Since RootsTech there has been lots of discussion regarding the features that AncestryDNA is and is not planning to offer their customers. I will address the many questions that I have received about the meetings in which I participated at the show, but first let's review:

Raw Data Downloads
On Thursday, AncestryDNA fulfilled their promise to allow customers to download their raw data. As Dr. Ken Chahine had assured me back in November, the file is not encrypted and is compatible with third party tools.

I sent my file to a number of third party providers:
  1. After working with it a bit, John Olson announced on the site that he expects that Gedmatch will be accepting AncestryDNA uploads in about two weeks. 
  2. David Pike told me that he has updated his tools to work with the AncestryDNA files.
  3. Leon Kull has reportedly updated his HIR search site to work with them as well.  
  4. Dr. Ann Turner has created an Excel macro to convert the AncestryDNA files to 23andMe format. 
At the "Ask the Expert" Genetic Genealogy panel that I moderated at RootsTech on Saturday:
  1. Bennett Greenspan told the audience that Family Tree DNA will be accepting AncestryDNA transfers into Family Finder starting on May 1st.
  2. Dr. Catherine Ball confirmed that the raw data file is not phased and that they are delivering it as they receive it from the chip manufacturer Illumina.  She also confirmed what Dr. Ann Turner had already discovered - the data labeled as "Chromosome 25" is from the PAR region. Further, the "Chromosome 23" label refers to the X chromosome data and "Chromosome 24" refers to the Y chromosome.
Additional notes:
  1. Unlike Family Tree DNA, AncestryDNA is not removing any SNPs from the data - medically relevant or not. 
  2. The overlap between AncestryDNA's raw data file and 23andMe's should be around 690,000  SNPs due to the fact that they are both using the same Illumina OmniExpress Plus base chip. The ~10,000 SNP difference can be accounted for due to a different set of poorly preforming probes and test SNPs. Family Tree DNA's should have a similar overlap for the same reasons.
  3. There is no mitochondrial DNA included in the raw data file because it is not included on the Illumina chip that they are using. (23andMe adds the mtDNA SNPs).

Search Function
As I expected from earlier conversations with AncestryDNA, a search function is next on the list. Kenny Freestone, Product Manager for AncestryDNA, discussed it in his presentation under the heading "What's Next". Although it is already in the works, Kenny could not provide a firm timeline for its availability when I asked.

We will be able to filter our list of matches by surname, location and username. As anyone who has worked with their AncestryDNA matches knows, this is sorely needed. There is no doubt that the many requests from customers pushed this up their list of priorities.

Genetic Ethnicity Update
Later this year, AncestryDNA will be updating their Genetic Ethnicity feature. They will provide more granularity in Europe and West Africa. We can also expect more accurate breakdowns. A number of AncestryDNA personnel acknowledged to me over the weekend that certain "ethnicities" (i.e. - Scandinavian) are overestimated for many customers. However, they also emphasized that much of the perceived problem with their admixture analysis stems from the question of "where and when". What they mean by this is that it is very difficult (and sometimes impossible) to pinpoint where specific DNA signatures were at an exact time in history.

As I always remind my readers, this portion of the science has a long way to go and will improve with more data and time. On the "more data" front, during her speech at the AncestryDNA luncheon on Friday, Dr. Ball was reportedly requesting that genealogists who know that all eight of their great grandparents were born in the same place share this information with AncestryDNA. This seems to imply that, like 23andMe has successfully done, AncestryDNA plans to use customer data to improve their predictions. They are also starting to work on incorporating the coveted SMGF collection into their admixture analysis, which should improve it greatly.

The good news is that AncestryDNA customers don't have to wait for this update to gain more insight into their ancestral origins. Now that AncestryDNA has made the raw data available, customers will be able to upload their raw data file to the various third party sites to try out the admixture calculators and/or send it to Dr. McDonald for his very highly regarded analysis.

Matching
AncestryDNA are currently working on an algorithm to improve matching for endogamous populations, specifically Ashkenazi Jews.

As I reported in November, the minimum threshold for matching is 5 megabase pairs. This was reconfirmed in a conversation I had with Dr. Ball on Friday. I also learned that there is no minimum SNP requirement. We discussed the possibility of AncestryDNA switching to centiMorgan measurements in the future.

Price
The test is now $99 for everyone - subscribers and non-subscribers. This was likely in response to 23andMe's recent price drop. Having attracted well over 120,000 customers in less than a year in business, AncestryDNA is proving to be an important player in this field. This new policy to attract subscribers and non-subscribers alike will only improve their market share.

International Customers
It does not appear that AncestryDNA has plans to offer their test to international customers in the near future, instead choosing to focus on the U.S. market for now.

Matching Segment Data and Chromosome Browser
On Friday at RootsTech, Dr. Tim Janzen and I sat down for a meeting with AncestryDNA management. Among others, we were joined by Dr. Ken Chahine, Senior Vice President and General Manager of DNA, and Dr. Catherine Ball, Vice President of Genomics and BioInformatics. (Dave Dowell also attended a portion of the meeting.)  I found them to be very receptive to hearing our requests and the reasons behind them. At no time did they state that they had decided not to build a chromosome browser or release matching segment data to their customers in the future. Dr. Ball did express some privacy concerns, but was open to hearing ideas of how this could be addressed.

Tim Janzen explains his feelings while Ken Chahine looks on

During the meeting, Tim very emphatically explained his feelings on the need for matching segment data (above) and I resorted to begging (below)... {hehe}

Catherine Ball, Ken Chahine, Tim Janzen, me, Dave Dowell and Steve Baloglu

On Saturday, after attending Kenny Freestone's presentation, four advanced genetic genealogists approached him to discuss the chromosome browser issue. In addition to myself, Tim Janzen, Angie Bush, and Nathan Machula were present for the conversation. Kenny didn't have much to say and mostly listened to the arguments that we presented covering why we feel that it is essential that AncestryDNA offer the matching segment data behind their relative predictions. At no time did he state that AncestryDNA would not offer a chromosome browser or that the delay in doing so was because AncestryDNA didn't think that their customers could understand it.  He did, however, confirm that it was not a top priority at this time. He also said that he personally reads all of the requests sent through the feedback button, so if you want them to reassess their priorities, then be sure and let them know.

Tim emphasized that both 23andMe and Family Tree DNA included a chromosome browser feature at the launch of their autosomal DNA product and wondered aloud why AncestryDNA had not done so as well.  I explained to Kenny (as well as in my meeting with management) that, as genealogists, we expect conclusions to be evidence based. It is not in line with this principle to simply be told that a certain common ancestor is responsible for a DNA match and be expected to take AncestryDNA's word for it. Where is the proof?  Since Kenny had shown a chart during his presentation of his ancestral lines that he claimed were genetically confirmed by AncestryDNA matches, I also pointed out the fact that those lines that he had shaded in weren't really confirmed without the actual genetic data to support that claim. To illustrate, I laid out my experience as follows:

On my AncestryDNA account, I was happy to find a shaky leaf hint a few weeks ago.




Upon reviewing the match, I noted that the common ancestor was through my mother's side. I was initially excited to see that I had inherited DNA from my 7th great grandparents on paper, Joseph Denison and Prudence Miner.


The only problem is that this match doesn't appear anywhere on my mother's 47 pages of matches. Do you know what this means? It means that I must have inherited the DNA responsible for the match through my father's side. Since all DNA inherited through my mother's line must come through her, AncestryDNA has identified the wrong common ancestor as the source of the DNA shared between LGB and me. A fluke of the algorithms...? Perhaps. Let's look at some more of my matches.






Once again, as you can see, the common ancestor identified by AncestryDNA is on my mother's side. A thorough search of my mother's matches shows that, once again, this person is not reported as a match to my mother. From this, we can only reach the conclusion that the DNA responsible for this match comes through my father's side - not my mother's. The common ancestor that I share with "Baerion" must be beyond a brick wall in her family tree or on my paternal side. In general, I have had more success filling in the branches of the maternal side of my family tree than the paternal side, so this is certainly possible.

Just to demonstrate that this isn't an isolated occurence, here is another one:






This match doesn't appear on my mother's match list either! So, out of my ten matches that have shaky leaves attached,  three of them apparently have common ancestors wrongly identified as the source of our matching DNA. Do you see the problem here? Does AncestryDNA?  If this match were, instead, at 23andMe or Family Tree DNA, I could check the DNA segment that we share and compare it to my other matches and/or my chromosome map. This would provide additional information and/or evidence to help me determine through which of my ancestral lines this segment of DNA was inherited. Might there be other explanations for these discrepancies? It is certainly possible, but without the underlying genetic data, it is impossible to say.

I am in the fortunate position to have tested my mother at AncestryDNA in addition to myself, so I can clearly see there is an issue. What about all of those people who have not tested a parent and are blindly accepting AncestryDNA's shared ancestor hints because they don't know otherwise? Isn't that kind of like copying someone's tree and just taking their word for it that it is correct with no sources or evidence attached? For now, those of us who do understand the finer points of autosomal DNA matching will have to do our best to convince our matches to upload to Gedmatch so they can see for themselves what they are missing.

As much as I, too, am disappointed that AncestryDNA has not yet provided the matching segment data, it is clear to me that the reasons behind this decision are far more complex than what others may claim is an attempt to dumb down the product because Ancestry.com thinks its customers are stupid. From my many conversations with Ken Chahine and others from AncestryDNA over the past year, I have come to appreciate that working within the framework of this 1.6 billion dollar corporation comes with its own set of challenges.

The Future
Tim Sullivan, CEO, has made it clear that Ancestry.com is committed to the DNA business and Ken Chahine has always been upfront with me and come through with his promises. So, I am going to give them the benefit of the doubt. From our very first conversation, I have advocated for the genetic genealogy community and looked out for our best interests and I won't stop doing so. I believe that they will do the right thing for their customers and the genetic genealogy community eventually. It may not happen as quickly as we would all like (yesterday!), but they are not the big bad wolf and I think it does us all a disservice to continually paint their intentions in a negative light. We are in early days yet. Let's give them a break.

58 comments:

  1. Excellent post, CeCe. And yes, let's all give Ancestry a break! As Tim Sullivan noted in his keynote speech at RootsTech, it is an extremely difficult task to please beginners and experts.

    It is also very important that both genetics and genealogy are seen as available and accessible to everyone, and I think the community will be better served if we are willing to be a bit patient as Ancestry works to develop the best product they can. It is a tricky balance to put things to market in a timely fashion while still taking the time needed to develop a quality product.

    Let's hope that moving forward we can all work to build the community together, rather than tearing down the companies building the technologies that are supporting this exciting field. Ancestry has delivered on their promised features to this point, and I believe they will continue to do so going forward, and we should at least give them credit for that.

    ReplyDelete
    Replies
    1. "...it is an extremely difficult task to please beginners and experts."

      If they're worried about intimidating beginners, they don't have to clutter their pages with advanced features. They could, for example, tuck "matching segment" information behind a link that says "ADVANCED FEATURES."

      Delete
    2. I also agree with this statement, TickedMD, and I have recommended the same thing to management at AncestryDNA for may months.
      Thanks for your comments,
      CeCe

      Delete
  2. Thank you CeCe. Keep the spotlights on them. We can be sure that we would not have the raw data files if you and others had not advocated.

    Rebekah

    ReplyDelete
  3. Thanks CeCe for this and all of your efforts to persuade Ancestry to be more user friendly to genetic genealogists.

    ReplyDelete
  4. Great post! If I were there I would have also begged for the ability to see matching segment data. I just can't imagine how useful this could be without it. What exactly were their privacy concerns about that? Did they ever explicitly say?

    ReplyDelete
  5. Thank you CeCe for supplying a fair assessment of the AncestryDNA matching segment/chromosome browser issue and providing illustrative examples of AncestryDNA identifying the wrong common ancestor as the source of shared DNA.

    As you mentioned, AncestryDNA has not outright refused to install a chromosome browser, but has communicated that the task is not a top priority. I suggest the genetic genealogy community mobilize to motivate AncestryDNA to get shared DNA segment data to us.

    In addition to flooding Ancestry’s feedback and social media channels, I also recommend that participants in AncestryDNA’s Human Genetic Diversity Project remove themselves from the project until Ancestry delivers the shared segment data/chromosome browser. You can withdraw all of your information from AncestryDNA’s project by emailing a request to consent@ancestry.com. Be sure to mention the lack of matching segment data and chromosome browser functions as the reason for your decision.

    Withdrawing from the project will not impact your results nor your access to your results: http://ldna.ancestry.com/legal/consentAgreement.aspx

    If you do not remember whether you accepted the Consent Agreement for this project, check your test settings on the AncestryDNA page; information about how you responded to the Consent Agreement appears in the right column.

    ReplyDelete
  6. Thanks for the detailed analysis, Cece. I'm hoping a good number of AncestryDNA's customers utilize Gedmatch.

    In regard to the presumed erroneous matches, playing Devil's Advocate, could the same issue not also occur if both you and your mother were near the 5 mb cut-off, with you being just over, and her being just under...a rounding error, so to speak? I would think this, or the possibility of more than one shared lineage would be more probable than an outright incorrect match. Just an alternative hypothesis to consider.

    ReplyDelete
    Replies
    1. @Xavier - The cut-off issue is what I was referring to when I asked "A fluke of the algorithms?" That was why I included all three of my examples since it is unlikely that this explanation could account for all three. I didn't say that it was an incorrect match. I believe it is probably a legitimate match, but just not through the identified common ancestor. I think the most likely answer is that in one for more of these scenarios, I just happen to share a common ancestor on both my maternal and paternal sides. AncestryDNA has simply identified the wrong relationship. The default is for the closest common ancestor in your tree. This will not always be an accurate assumption. Our common ancestor who is responsible for the shared DNA may be behind one of our brick walls as well.

      Thanks for your comments, especially since they allowed me to clarify my thoughts.
      CeCe

      Delete
    2. Yes, that'll be a hard one to answer, without both the matching segments and at least one parent tested (I haven't bothered to test any family at Ancestry). They may need to adopt something like in FTM where you can list all possible relationship paths. Of course, I've been asking for this in the web-based trees and haven't gotten it, so I won't hold my breath!

      Delete
    3. I failed to mention that one of the reasons that I am fairly confident of the situation being as it appears and I hypothesized above is because it is not uncommon at 23andMe that I find a common ancestor on my mother's side when I match on my father's side.(I have tested over 30 family members there and three at AncestryDNA.) I was already aware of the possibility that this could occur due to my experience at 23andMe and am always very thankful that I am fortunate to have tested extended family to be able to sort these things out.

      Delete
  7. @Xavier: Your posed a plausible alternative hypothesis, an hypothesis that one could explore if AncestryDNA provided access to matching DNA segment data. Since Ancestry has concealed that data, we can only speculate. That is why AncestryDNA customers and members of the genetic genealogy community must take action: pure speculation has no place in genetic genealogy.

    ReplyDelete
    Replies
    1. I have no idea how they contend that the chromosome browser, or at least matching segment information, is not relevant. The alternative is poring through every tree looking for common surnames and trying to triangulate...a useful tactic 10 years ago, but the rest of the genealogy world has moved well past that!

      Delete
    2. @Xavier - From my perspective they haven't prioirtized it to the level it deserves, but I don't think they have ever claimed that it isn't relevant.

      Delete
  8. Thanks for the post. How did you obtain the Ancestry DNA raw data? I have not received any communication from Ancestry. I looked at my Ancestry DNA page and nothing has changed. I was hoping to see a 'Download' button.

    ReplyDelete
    Replies
    1. It's under the "Manage Settings" link on the main page. A couple of different bloggers have given detailed screen shots as to where to find it, and Ancestry blogged it here: http://blogs.ancestry.com/ancestry/2013/03/24/the-latest-installment-of-new-ancestrydna-features/

      Delete
    2. I am unable to download my Ancestry DNA raw data because I never receive the "confirming" email

      I have not recently changed the email address associated with my Ancestry account. I have 1) verified that my email address is still receiving emails, 2) verified that Ancestry has previously sent me messages using that email address 3) verified that my "confirming" email did not end up in my "spam" folder

      Anyone else unable to download their Ancestry DNA raw data?

      Delete
    3. Bert, I have had the same problem. I have tried several times, and I have never received the email either. Did you ever resolve your problem?

      Delete
    4. Bert and Drew,

      I have had the same experience as you. I followed up the lack-of-response with a terse and irritated piece of feedback. Still, no response to either!! I guess it's just not convenient for them right now. If I could do it over again (money spent on this DNA test), I would, with a different organization. The "results," in my mind, were all but worthless . . . Please post if you get a response! Thanks!

      Delete
    5. Bert and Drew,

      I have had the same experience as you. I followed up the lack-of-response with a terse and irritated piece of feedback. Still, no response to either!! I guess it's just not convenient for them right now. If I could do it over again (money spent on this DNA test), I would, with a different organization. The "results," in my mind, were all but worthless . . . Please post if you get a response! Thanks!

      Delete
    6. All:

      Shortly after I posted (attempted to post) this note, I received notice from AncestryDNA that my download was available. So, there you have it, for what it's worth. Thank you!

      Delete
  9. Great article and synopsis. I was shocked to learn that ancestry has used megabases rather than centiMorgans, and has ignored any minimal SNP criteria.Isn't that akin to malpractice, seriously?

    ReplyDelete
    Replies
    1. Dear Dwight,
      I don’t consider it “malpractice” to use 5 million base pairs as the criteria for a match as Ancestry.com has done with their AncestryDNA product. In any case, I would have definitely used cMs instead of a set number of base pairs as the criteria for establishing a match with the AncestryDNA test if I had been designing the product. However, your point is well taken that using a 5 million base pair threshold for the AncestryDNA test has the effect of producing matching segments that could be as short as 1 or 2 cMs and obscuring some matching segments that have 10 cMs or more in them. As a general rule, finding a genealogical connection with people who share only a 1 to 2 cM DNA segment is significantly more challenging than finding a connection with people who share 5 or more cMs with you. There are typically about 5 million base pairs in a 5 cM DNA segment. However, as the cM to base pair comparison charts at http://web.archive.org/web/20070113005025/http://compgen.rutgers.edu/maps/compare.pdf illustrate, there is not always a consistent correlation at each location on the various autosomal chromosomes between the base pairs and cMs.

      I have not read anywhere that Ancestry.com has a minimum number of SNPs that they require for a match. In any case, the data is phased before the match lists are produced, so a lower number of SNPs could be used as a threshold as compared to the threshold that would be advised when using unphased data. 23andMe uses a 200 SNP threshold when generating matches on the X chromosome between males, where the X chromosome is obviously already phased. 23andMe requires 700 cMs as a threshold using unphased data and FTDNA with their Family Finder product requires a minimum of 500 cMs. The AncestryDNA test includes 682,549 autosomal SNPs. Since there are about 3 billion base pairs in a human genome, this would suggest that there are on average about 1137 SNPs on the chip for each 5 million base pair segment of the genome. This would suggest that on average matching segments in the AncestryDNA test have 5 times as many SNPs in them than the X chromosome segments that meet 23andMe’s threshold for matches between two males. Thus, the number of SNPs in the matching segments that Ancestry.com uses as the basis for generating the match lists in the AncestryDNA test should generally be quite adequate.

      Delete
  10. Thank you CeCe for this wonderful and informative article about AncestryDNA. I am very pleased that they have allowed us to download our raw data. And thankful for the other companies that will allow us to load our raw data to their sites. I know we all want it "now"; but we should give AncestryDNA a bit of a break...after all these things take time to work through. And the fact that they seem open to working towards a common goal is reason to be thankful. Again, thank you for all of this information. You have the wonderful ability of making this technical information so much more understandable. Bless you for all you do for the DNA and Adoptee Communities.

    ReplyDelete
  11. @Bernie - You can obtain your raw data from the "Manage Test Settings" link next to the "View Results" button. On the right sidebar after clicking the Settings link is the button to begin the process of getting the raw data.

    ReplyDelete
  12. CeCe,

    Thanks for the excellent recap and for representing genetic genealogists' interests with AncestryDNA.

    Regards,

    Larry

    ReplyDelete
  13. Does anyone know if ancestry will ever accept 23andme uploads just like FTDNA currently does?

    ReplyDelete
  14. Thanks, CeCe, for your synopsis as well as your advocacy.

    I think you may have indirectly identified the reason Ancestry is not be eager to release its own chromosome browser: it may well undermine their current "shaky leaf" ecosystem. If users could assess potential AncestryDNA matches using a chromosome-comparison tool, what would they make of the disparities that you have shown are possible? Wouldn't Ancestry have to retool its matching algorithms to accommodate this gap? And to what degree would this impact Ancestry's ability to display a DNA-related "shaky leaf" for potential matches?

    Of course, the flip side of that challenge is that Ancestry might be able to substantially improve its matching algorithms and improve the precision of its suggested matches. But whether it would reduce Ancestry's ability to report potential matches, or at least to link the suggested match to an uploaded tree, it might appear to Ancestry as a net loss.

    ReplyDelete
  15. Big round of applause to CeCe for assembling the fantastic genetic DNA panel at RootsTech. You did a great job of moderating and keeping the discussion moving. You probably noticed the man on the front row taking a movie. I learned later on that he is a distant DNA cousin to my father-in-law. I happened to find out through a discussion with him and checking FamilyFinder at FTDNA. It is a small world.

    ReplyDelete
  16. A big thank you, CeCe, for this and your ongoing involvement. As a UK/European customer, I sincerely hope Ancestry will soon open up autosomal testing to those outside the USA. After all, an important motivation for people in the US to test must surely be to trace their migrant ancestors from other parts of the world. What better way to do this than to compare genes with those who stayed behind??

    ReplyDelete
  17. Even at 23andme and FTDNA, it's easy to accidentally assign a segment to a supposed ancestor if you don't have multiple family members tested. For example, my father matched a man, who is his likely (though not 100% confirmed, waiting on an mtdna test for confirmation) 5th cousin. I was initially very excited to see this as I thought it more or less confirmed our relationship since they otherwise do not share a common ancestry. Additionally, he did not match my grandmother, which was another plus for this connection. However, when I contacted him, I found out that my father matched his father (I could not see b/c he was anonymous at 23andme), which was the wrong parent!

    ReplyDelete
    Replies
    1. I am weighing which test to purchase for my relatives.

      CeCe's post mentions that 23andMe's test includes SNPs from mtDNA, which pushes me in that direction, but I don't know if what they test is equivalent to what a separate mtDNA test would provide. I see that you had to get an mtDNA test for confirmation; was that to supplement a 23andMe test, or a test from another company?

      Delete
  18. Thanks so much CeCe. We are fortunate to have you as our advocate.

    I am hoping that enough real-life examples will help push the chromosome browser higher on the list. I am trying to "prove" my father line past the 7th generation, and need conformation on several of the matches to determine if we indeed found his family. It would be exciting since several much more experienced genealogists have been searching for years. And good publicity for Ancestry DNA - hmmmm....

    ReplyDelete
  19. There is another possible explanation for the hint/mother mismatch. It has been stated before, that AncestryDNA is performing a statistical phasing of the data, and using the phased sequences for matching. Because the phasing is based on statistical data, not actual data (like phasing a child's data using the parent's data) there likely will be some errors. It may be that your mother's data had more of these phasing errors - enough that it surpassed the error allowance of the segment matching algorithm, so that you matched, but your mother didn't. To prove this idea would require comparing the raw data of one of those matches with you and your mother's data.

    ReplyDelete
    Replies
    1. Hi Steve,
      The same thought occurred to me too since I had discussed the limitations of the phasing engine with Dr. Ball and am familiar with the issues from 23andMe's scientists as well. I didn't mention these other scenarios that I had considered and mostly ruled out to try to keep the examples simple. The reason that I doubt that is the explanation is because my mother is extremely easy to phase because she is 50% Finnish and 50% Northern European mutt. 23andMe had no problem at all with her phasing using a similar engine since the Finnish DNA is so unique. You can see her phased results on 23andMe's AC here (scroll down to the chromosome painting that says "one Finnish parent"): http://www.yourgeneticgenealogist.com/2012/12/23andmes-new-ancestry-painting-first.html
      I'll see if any of these matches will be willing to upload to Gedmatch when it is ready in a couple of weeks to see if I can learn anything more.
      It is much more likely that the situation is what I proposed in at least one of the above three instances and likely more since both my parents have Colonial NE ancestry.
      Thanks for the comment,
      CeCe

      Delete
    2. You make good points. Something to consider though, before ruling the idea out entirely is that AC analysis is likely much more forgiving of some incorrectly phased SNPs than segment matching. In a 15cm segment, a handful of miss-phased SNPs having little allele frequency variation between populations might have little or no effect on population assignment, but be the difference between matching and not matching a segment that would have matched had the phasing been completely accurate.

      Delete
    3. You make good points too, Steve, but why would my mother's phasing suffer from more switch errors than mine when, theoretically, she should be much easier to phase? Her parents come from very different populations while mine have very similar genetic components making up at least half of their DNA. Not to mention, this is three separate cases.

      Delete
    4. I can only speculate without seeing real data, or knowing more about how their matching really works. It might be something as simple as this: having parents with dissimilar ancestry, she may have more heterozygous SNPs than you. Each heterozygous SNP is another opportunity for a switch error. Homozygous SNPs are the easy ones to phase. How do you and your mother's homozygous counts compare?

      I have almost no detail about Ancestry's matching algorithm, so I'm going mostly on intuition here. I'm looking at this from a software developer's viewpoint. If I had written the matching software, and read your problem description, the first thing I would be checking is whether structure my algorithm imputed onto your mother's data was preventing true matches from being made. The fact that it happened in three cases would actually make me MORE suspicious, not less.

      Sorry, I didn't mean to drag this out, because at this point we can only guess. I'm interested to hear what you find if you're able to get the data you need to find some real answers.

      Delete
  20. Thank you CeCe for this informative explanation on what Ancestry DNA results might hold for us in the future. Without the chromosome matching data, I feel the data is unusable. Of my over 300 matches, NO ONE has contacted me for follow up! I have sent email to over 40 of the matches, and only one person replied. More that a bit frustrating. This causes me to wonder if most of the 120,000 customers are just beginning their genealogy learning curve.

    ReplyDelete
  21. I think dna.ancestry.com is near useless. All this talk about "well, that's not a priority right now" etc... is lame - working on a search function - give me a break... its a few weeks worth of coding at most! CeCe what the heck ARE THEY working on? In 1 year there has been zero advancement of the service they offer or the tools available. What have they been doing??? CeCe why don't you ask them when they're going to offer a product that actually generates meaningful cousin matching results... not the garbage we get now. Your examples above are proof enough that there is absolutely no value in the matches they offer. And by the way, half the participants have their trees locked so as to make even the faintest chance of finding success an act of futility.

    ReplyDelete
  22. Aw gee, CeCe! Ancestry.com has been a gold mine for me, especially a month ago, when I confidently announced to my Bodamer cousins that one of my Ancestry.com DNA links w/no surname match had a proven 4th ggm w/maiden name. I hypothesized that she COULD be a sister to our 4th ggm Anna Elisabeth, who has remained otherwise nameless to generations of Bodamer genealogists. Now I have to slink back w/ tail between legs.
    Never saw the need to test my 94-yr-old mama, but now? And should it be Ancestry.com, so maybe can give my results there some [sic] confidence?
    Anyhow, if you're a Stonington CT Hewitt, then we're probably cousins, since Eliphet Hobart Hewitt somehow made his way down to low-country SC back in the dark ages of 1800.
    Thanks for all the good stuff you do!

    ReplyDelete
  23. It's good to get an update from Ancestry, but also disappointing to learn what is/isn't happening. AncestryDNA went live last May - almost a year and they're still working on a search feature?! And to see the number of likely incorrect matches you have? Seriously discouraging, especially since chromosome segments aren't a priority. I don't see any additional AncestryDNA kits in my family's future. (On top of the fact that I got new matches today, some with hints but no leaf - even the features they already have don't work!)

    ReplyDelete
  24. CeCe and Tim...Congratulations on your wonderful work in educating AncestryDNA to what is needed in the genetic genealogy world. In time they may succumb to our wishes, and if they wish to be viable players at this, they will. Sadly, however, they may be discouraging the Newbie market and/or misleading them and will continue to do so until they realize what is needed and what is reliable.

    I'm sure they will pull it together. I just want them to act quickly on what is inaccurate (as you mentioned regarding that SHAKY LEAF business) and add the features which will make them a contender.

    At this point in time, I must let my audiences know the current facts of the situation for each company so they can decide if they have the patience to wait until Ancestry catches up with Family Tree DNA and 23andMe (who do realized the value of the Chromosome Browser from the onset) or not. Many of us do not wish to wait a year or more for such features. As fast as this field moves, AncestryDNA may always lag behind if they do not realize what is needed now and act on it.

    Kudos to you both, and espeically to Tim as I know his passion on being able to use the raw data.

    ReplyDelete
  25. Thank you for the excellent info on Ancestry. My autosomal test showed 5 different nationalities when I can trace back Italian and Irish known ancestors for over 200 years. Is very upsetting subject for me. Out of all the folks I emailed only ONE matched me and the others nothing at all. Some people never even bothered to return my note to them. I find it very Interesting about the high rates of Scandinavian results. They gave me 30 percent! I will now be taking business to FTdna and 23 and me and hopefully I can figure out who I am. It angers me that I spent all this money and STILL have gotten very few answers. I know it's not a perfect science BUT have expected more from Ancestry. Keep up the good work and don't let them off the hook!!!

    ReplyDelete
    Replies
    1. What bothers me most about the Ancestry.com DNA project is no response and/or the private trees from matches. It's very disappointing.

      Delete
  26. Hi all,

    I'm a genetic genealogy newbie whose first experience with DNA was as an Ancestry DNA Beta participant. I'm happy to have my raw data now, but because of privacy concerns I'm unsure if I should upload to third party providers. Can I be assured that if I upload my DNA data, it will remain private?

    CeCe, thank you for all of the helpful information you have provided!

    ReplyDelete
  27. Demand accuracy and transparency from AncestryDNA - Sign this petition: https://www.change.org/petitions/ancestry-com-dna-llc-give-ancestrydna-customers-dna-segment-data-a-chromosome-browser-now

    ReplyDelete
  28. How do we talk Ancestry into offering their DNA test in Poland?

    ReplyDelete
  29. Thanks for another excellent post and for raising these issues with Ancestry.

    I'm wondering if anyone at Ancestry has ever indicated if a customer currently has a way to tell if a match includes an Identical by Descent segment. For example, can we be assured of an IBD segment in any match prior to the "Distant Cousin" orange tab? Perhaps there is a finer cut, say "5th - 8th cousin, moderate confidence" or better.

    ReplyDelete
  30. I have a question - is there a tool so that I can upload my MT DNA from 23andme to my ancestry.com account? I am in Canada so that's how I ended up testing with 23andme And I found an area on 23andMe that showed me the following:

    View differences from rCRS as CSV
    # Differences between user's SNP calls and the rCRS
    # Format:
    # chromosome,snp_id,rCRS_position,user_call,ancestral_call
    MT,i3002176,208,A,T
    MT,rs2853515,263,G,A
    MT,i4001199,300,C,A
    MT,i3001450,456,T,C
    MT,i3001357,469,G,C
    MT,rs2853518,750,G,A
    MT,rs2001030,1438,G,A
    MT,i3001462,4336,C,T
    MT,rs3021086,4769,G,A
    MT,i3002272,5820,G,C
    MT,i3000537,5877,G,C
    MT,i4000892,8285,I,C
    MT,rs2001031,8860,G,A
    MT,i3000918,10388,T,A
    MT,i3001273,14290,A,T
    MT,i3001290,14422,A,T
    MT,i3001352,15072,T,A
    MT,rs2853508,15326,G,A
    MT,i4001351,16179,T,C
    MT,i4001352,16180,C,A
    MT,i3001821,16304,G,T
    MT,i3001902,16390,A,G

    And I was pretty sure that this was what I needed to manually enter on ancestry but I ended up getting an error message when I tried.

    Please tell me there is a way to make this work.

    Thanks.

    ReplyDelete
  31. If ancestry.com would add matching segment data, they'd be ahead of everyone else with respect to using DNA to explore ancestry. Linking their user-friendly trees with DNA results was brilliant. Now they need to take one more step.

    ReplyDelete
  32. It really sucks to have to keep asking other people that I have matches with on Ancestry to upload their data to Gedmatch, it is quite a pain as Ancestry does a total disservice to those of us who are trying to confirm and break through some 2nd-3rd grandparent walls and there is no real way to do that on Ancestry.

    ReplyDelete
  33. A year later and Ancestry has done zero, zip, nada. Still have to try to get to get their data up on Gedmatch or the DNA on ancestry is pointless.

    ReplyDelete
  34. "The overlap between AncestryDNA's raw data file and 23andMe's should be around 690,000 SNPs due to the fact that they are both using the same Illumina OmniExpress Plus base chip. The ~10,000 SNP difference can be accounted for due to a different set of poorly preforming probes and test SNPs."

    So when I'm comparing the raw data side by side, I noticed there are missing pairs on both sides, that one has and the other doesn't, but the matching pairs I've seen are in the same order. This tells me that neither dna set is complete, but match enough on GedMatch.com to read as a 100% match.

    1) So is this solely "due to a different set of poorly preforming probes and test SNPs."?

    2) Is there a master list to which to compare both dna results to so I can shuffle the non-matching pairs into a more complete list? Are there any tools out there that does this automatically? I would think a more complete list would provide greater accuracy in matching to relatives.

    ReplyDelete
  35. It isn't even essential to have a chromosome browser, per se. Even just text data that shows the start and end points would suffice.

    ReplyDelete