Thursday, January 12, 2012

Using Public Y-DNA Profiles to Track Down Criminals: Would You?

For the last couple of days, the genetic genealogy world has been inflamed by the news that a 20-year-old Washington murder case has being re-examined in light of a Y-DNA analysis performed by the forensic genealogist, Colleen Fitzpatrick. Many surname DNA project administrators are, understandably, extremely concerned about the implications and repercussions of this national story on DNA testing for genealogy. I hesitated to write about it because it has stirred such strong feelings in our community, but finally realized that is exactly why I must write about it. Many would prefer to sweep it under the rug because of the damage it could do to our avocation, but as the story has been picked up by national news outlets, I have reached the conclusion that this is no longer realistic. As a result, I will add my voice to the discussion.

Apparently, Fitzpatrick utilized the public Y-DNA databases such as Y-search and, possibly, public surname DNA projects in an attempt to determine the potential surname of the murderer of 16-year old cheerleader, Sarah Yarborough of Federal Way. Blaine Bettinger of "The Genetic Genealogist" gives a good overview in his post, so I won't rehash the details here, except to say that I share the concerns of many in the community that in the sensationalism, the probability of the killer bearing this particular surname has been overestimated. I also worry along with my colleagues that these types of stories could discourage some people from participating in DNA testing for genealogical or other non-essential purposes. While I think these are important considerations, we may be focusing on the wrong things.

Bettinger emphasizes that this is, in reality, not a new technique. Along those lines, I'd like to share a comment that I wrote to the ISOGG mailing list for project administrators in October of last year when a BBC article on the FBI's potential use of familial DNA, including Y-DNA, was introduced and discussed. When another member wondered where law enforcement might get its matches, I responded, "Our surname projects." Met with skepticism, I explained:

Because many FTDNA surname projects are the best and most thorough source of Y-DNA research for many surnames. I wouldn't think they would use individual results, rather the surname research as a whole to compare Y-DNA samples against. For instance, "Anonymous Sample X matches Group Two of the well-researched XXYZX Surname DNA Project."  Hmmm...maybe we should be looking at male XXYZXs in the area?
Yes, because of the absence of chain-of-custody maybe they couldn't use these databases for proof in court, but they could certainly use them to get leads. If you are saying that these projects are meaningless because people may use aliases when they test, then all of our combined surname research has been pointless.
What other options are there if they [law enforcement] are now interested in Y-DNA, assuming in the past they have routinely only tested CODIS markers? For them to retest all of the existing samples for Y-DNA and create surname matching databases? That seems like a very time and money intensive proposition. I know that presented with an alleged criminal's anonymous Y-DNA profile most of us here could, in some cases, pinpoint the likely surname without much difficulty using our public resources - Y-search and public surname projects. What would stop the FBI from doing the same? How do we know that they don't already?
I've been thinking about this for awhile. If one of my loved ones was murdered and I had access to that DNA sample, you better believe that I would be using our databases to try to figure out who was guilty. Wouldn't you? 

Am I wrong?

No one wanted to discuss the subject then. 

I'm not claiming to be clairvoyant, in fact, this was bound to happen sooner or later. If it was an obvious possibility to me, someone who has no experience in law enforcement and doesn't even watch CSI, it had to be abundantly clear to those in a position to be able to do something about it. The Yarborough case may be the first time these tools have publicly been used for this purpose, but we would be naive to think it is truly the first time these methods have come into play in the process of an investigation of this kind.

Fortunately, I believe that we still have a leg to stand on when we tell our potential project members that law enforcement cannot use their DNA, tested by genetic genealogy companies, against them. As I also wrote that day on the mailing list:

I also strongly believe that YOUR (as in a single testee's) sample cannot be used against YOU because of the chain of custody issue...I just believe that our PUBLIC databases (not FTDNA's private database) are a resource in which GROUPS of samples combined with our research can be used against criminals and I am all for that!

I am aware of the ethical arguments for and against this practice, but, in my opinion, what it comes down to is this: If one of your loved ones was murdered and you believed that you could identify the guilty party using the same resources that we use for our hobby...wouldn't you?

Related Stories around the blogosphere (for more excellent analysis of these issues, please also read the comment section below):

 "Does DNA Link 1991 Killing to Colonial Era Family? by Blaine Bettinger of The Genetic Genealogist

"Unexpected Use of DNA" by Debbie Parker Wayne of Deb's Delvings in Genealogy


  1. This case has raised some interesting ethical issues. I do not see anything wrong in principle with using Y-STR testing to find leads in criminal investigations, provided that the technique is used appropriately and does not unnecessarily stigmatise innocent parties. However, even with the standard 37-marker test that is used in Y-DNA projects people can often have many matches with different surnames. As an extreme example I have one project member who has over 400 matches at 37 markers with several hundred different surnames. The genetic genealogy databases are also biased in favour of the genealogical lines that are of most interest. While a search might yield one surname of interest, there might, therefore, be many other surnames that could potentially match but for which no test results are available. The vast majority of the million or so surnames that might be found in the United States are not yet represented in any genetic genealogy database. I would suggest that a Y-DNA test can only be used effectively in such investigations if a minimum of 67-markers are used and the match is very close (ie, a genetic distance of no more than 1 or 2). If the person is of Irish ancestry from the Niall of the Nine Hostages line or of Ashkenazi Jewish ancestry then even 67 markers would be insufficient. At the upper extreme people from these groups can still have several hundred matches with a huge variety of different surnames at 67 markers. The police have not released the details of their testing in the current case but I would hope that they will have tested the sample on a minimum of 67 markers, especially in view of the speculation that is now being aired on internet forums about people with the surname in the locality. If only a small number of markers have been used then the results are about as helpful as knowing the colour of the suspect's car.

  2. The whole idea of seeking a lead on a suspect's surname from Y-DNA predates most genealogical testing. Although haplotyping is much easier today and the sequences available to compare to are much more numerous, the limitations are the same now as they were when it was first proposed 15 years ago:

    "Y chromosomes are co-inherited with surnames in many societies, and, in an ideal world, a sufficiently detailed Y haplotype could give police officers the surname of the person who left a sample at a crime scene. To be realistic, however, the practicalities of haplotyping and the high frequency of non-paternity (itself an issue where Y
    typing is relevant) make this a highly fanciful scenario."

    Jobling, M.A., Pandya, A. and Tyler-Smith, C. (1997) The Y chromosome in forensic analysis and paternity testing. Int. J. Legal Med. 110, 118-124.

  3. I like your take on this, Cece. Blaine's article gave me a lot to think about. Debbie Kennett's comments above make excellent points.

    I am all for using the best means possible to find violent criminals and get them off the street. Based on the fact that there is still so much we don't know about DNA and that humans have a history of thinking we know more than we do, I still prefer the genealogical DNA databases not be used at this time for criminal investigations. With faulty assumptions based on the DNA findings, an investigator could concentrate his efforts in one direction and ignore leads pointing in a different direction. I know, this happens today without DNA being involved and will continue - it's human nature.

    I am sure you are right that the genealogical DNA databases have been used before and this is just the first widely-publicized admission of the use. But every time I think of the well-known genetic genealogist who states in his presentations that he differs on two markers from his brothers, a closed-list message that indicates a father and son differ on three markers, and my own family where there are more Y-DNA marker differences than we would normally expect to see in four generations, it brings home to me how much more we need to learn about our DNA. Law enforcement needs to have good concrete evidence aside from matching familial DNA before focusing in on one suspect.

  4. While I am a for generating leads and bringing criminals to justice, the sensationalism surrounding this case bothers me.

    For one, it only generated a possible surname as a lead - not a lead to a person, not an arrest, and not a conviction. Publicizing this information prior to an arrest could drive the perpetrator further underground and he may permanently escape conviction because too much was leaked to the media. It was more about the story and its publicity than it was about capturing the responsible party in the crime.

    Second: it also bothers me that the information regarding the surname in question was garnered under the guise of seeking a genealogical query for a friend. I have a real ethical concern regarding the methods used in the gathering this information. All of the rationalization regarding this tactic does not diminish that some of the methods were not ethically (and perhaps not morally) sound.

    Third, the publicity of the usage of genealogical Y-DNA resources without the public's general understanding of the process only serves to generate paranoia for those sitting on the fence regarding whether to test or not. This creates suspicion on Y-DNA projects as a whole. This would be the case even if the information was gathered in an ethically and morally sound manner. Tying this to a DNA project puts that project as well as others in jeopardy.

    Fourth, since we do not know the number of markers used in comparison, we have no way of knowing if the science was sound. Someone on the ISOGG list reported that FTDNA believed that Yfiler did the testing. If so, this test only offers 17 markers for comparison. Therefore the original match could have yielded numerous surnames - many which would have matched exactly, but as the resolution increases, these typically fall by the wayside. I have over 100 different surnames matching my 12 marker panel exactly.

    Can we be sure that this is not the case? Was it really the correct family surname? Could it be that someone with the perp's surname lineage never tested and the real surname is something different. Perhaps there was an adoption or an NPE in the perp's ancestry and the surname is of no consequence. There are too many unknown variables here to point to this one particular family with the bravado of "now we know this criminal's patrilineal ancestry. We've almost solved this 20 year old cold case."

    These are my concerns regarding this particular story.

  5. Wow, I am happy to see such well-informed comments on my blog. I think all of you have raised relevant and very valid points. I see the wisdom in each one and have shared all of your concerns ever since first confronted with this news. I didn't delve into most of these specific issues because they had been covered quite well by others before I had the chance to post and I wanted to look at it in a different light, however I do feel that it is valuable to have all of this information in one place for the visitor who may not be reading the same blogs and lists as those of us who are a part of the community. Thank you all for your reasonable analyses of this issue. I will post an update in the body of the blog recommending readers view your comments.

  6. Various ethical questions to consider, but had to consider inevitable.

    Given 10% or better likelihood that the individual does not share same surname as that associated with his Y DNA depending on how common the Y DNA, lack of acknowledging such and sensationalizing puts the story up there with the one about European Y DNA a King Tutt a few months ago.

  7. Jim is right that the sensationalism that has surrounded this case is also of concern. The link with the Fuller surname has been reported with a high degree of certainty (90% or 97.5% depending on which report you read). Such a high level of confidence would only be justified if the police had a 67-marker sample which had a very close match. Given that the original sample was taken from a crime scene and is now over 20 years old it seems highly unlikely that the lab would have been able to get results for this many markers.

    Even then the rate of NPEs needs to be taken into account. The effect of NPEs is cumulative. According to the Sykes/Irwin study, using an NPE rate of 1.3% generation, after 700 years only around 50% of men can be expected to bear the same surname.

    Sykes B. Irvin C. Surnames and the Y Chromosome. American Journal of Human Genetics 2000 April; 66(4): 1417–1419.

    1. My guess is that a DNA sample will have been kept deep-frozen (probably at -70°C) since soon after it was collected, will not have deteriorated in those 20 years. The critical factor determining its quality will be the temperature and time that it was left at before it was frozen. As the murder was in midwinter in Washington DC, and the body seems to have been found fairly quickly, I doubt that there was much degradation in that time either. I would expect any number of markers would be measurable. However, if they can get a good number of STRs from the sample, they ought to be able to get probable eye and hair colour from autosomal SNPs as well, which would seem to be as useful as leads as a putative surname.

  8. You make a very good point about the use of autosomal SNPs for hair colour and eye colour. I do know that Family Tree DNA have had problems using some of their stored DNA samples. There have been cases of people who have not been able to upgrade existing samples because there was insufficient DNA, though I believe this has happened mostly with the autosomal Family Finder test which requires a lot of DNA.

  9. Great dialogue on the subject. I am no expert but this sounds similar to cases where researchers were searching for possible DNA matches to remains found of military persons. I believe these were some of the early uses of DNA by people like Smolenyak. If you have a common surname chances are someone with a similar surname may have their DNA in a criminal data bank. I am often wondered what level of crimes need to be committed for police to have someone’s DNA in a databank. We are also talking about yDNA the focus is on men only. Genealogy DNA is world wide and I don’t know if police DNA databanks are also this broad. It is also my understanding many military personal now submit a DNA sample for possible identification in war. I wonder if the police also search the military DNA database of if they can even get access. You may not even have an exact match with your cousins so the entire exercise seems to be causing some to become paranoid. When in the past the media have gotten involved is sometimes caused individuals to shy away from a request for genealogy DNA. DNA is just another tool for genealogists and the police and a tool that doesn’t provide all the answers by itself. We all know this from all years of genealogy researching. Genetic identification will become more common place as science progresses so one day everyone may have a DNA identification taken on the day they are born.