Showing posts with label Jim Wilson. Show all posts
Showing posts with label Jim Wilson. Show all posts

Tuesday, March 4, 2014

Dr. Jim Wilson and ScotlandsDNA Release Y-SNP Positions for Chromo2

In a move that I know will make our community of citizen scientists and Y-SNP researchers very happy, Dr. Jim Wilson sent me an email with a file containing the ScotlandsDNA/BritainsDNA Chromo2 Y-SNP positions and this announcement today:

ScotlandsDNA are happy to share the chromosome positions for the Y chromosome SNPs on the chromo2 chip. We hope these catalyse a step change in the development of the Y chromosome tree. Wherever we have looked the structure has increased greatly in resolution, but we simply do not have time to analyse all the data and so are sharing this file with the community to allow everyone to take part. These are the first fruits of whole Y chromosome sequencing, taken out into a much larger population - the beginning of understanding what all the new markers mean.

The file can be found here

Thanks again, Jim!


Wednesday, February 19, 2014

BritainsDNA Chromo2 Y-SNP Results Spreadsheet

I received an email from Dr. Jim Wilson of BritainsDNA today which included a link to a spreadsheet with Chromo2 Y-SNP results. He has given me permission to publish it with his comments:

We have finally got round to releasing an anonymised dataset of ~2000 chromo2Y results. This is an excel sheet with ~14,200 SNP results for ~2000 random men using the chromo2 chip, so will be a goldmine for discovering further genealogical structure in European haplogroups. I think it will be of great interest to genetic genealogists and others who are interested in breaking down their haplogroups and subgroups. 

The link is:  
https://www.britainsdna.com/download/C2_2000_v2.zip
(Updated 2/24/14)

Thanks again, Jim!


Wednesday, November 20, 2013

A List of Alternate Names for the Y-SNPs from BritainsDNA's Chromo2 Test

(For Advanced Y-DNA Researchers)

Dr. James Wilson from BritainsDNA has sent me a list of the alternate names for the Chromo2 Y-SNPs and has given me permission to post it for public access with his comments, as follows:

Please find attached a list of alternate names for chromo2 Y SNPs...This is based on comparisons some months ago now, although a few SNPs have been updated in the meantime...Where there is no alternative name for an S SNP it means it was not listed/named in any compendium, browser or database I had available when this file was put together and is not available in any other product as far as I am aware (of course apart from a complete Y chromosome sequence).

Note that all the SNPs on this list manufactured on the chip, but a small proportion do not give good genotyping clusters; I haven't had time to clean them out. There are also a small number of SNPs not on this list eg S28, S250, which we Taqman in the appropriate samples as Illumina appear to have removed them from the design and they have no proxies on the chip or none known.


In due course I also intend to share the genome co-ordinates to allow comparisons with whole Y chromosome sequences, despite this being a case of handing larger competitors the fruits of our investment. At present I have started to do that for individual SNPs that are queried on a case-by-case basis, as I don't as yet have the permissions of all of the sequenced individuals to hand out their SNPs in this way.


You can find the list here. (Make sure you download the entire list and not just what is visible through Google Docs.)

Thanks, Jim!

Thursday, September 20, 2012

Let's All Start Using Terminal SNP Labels Instead of Y Haplogroup Subclade Names, Okay?


Is it just me or have the subclade names for Y-DNA just gotten out of control? I work with DNA all day long and I can't even keep up with all of the changes, so I have decided to start using the Terminal SNP labels exclusively. May I gently suggest that you do so also?













I frequently receive emails from otherwise well-informed people asking what their Y-DNA haplogroup subclade means, and it isn't their fault they are confused. You see, if they try to Google it, they are often unable to find information. If they try to locate academic papers on it, they are usually unsuccessful. Why is this? Well, the subclade name that they are given by their testing company may not be the same name that another testing company uses, or even the same as it was when they were first assigned it...and, quite likely, it isn't the same as the one on the most up-to-date tree at the International Society of Genetic Genealogy.

I have to admit that when R1b1b2 was changed to R1b1a2, I just started saying "R1b...whatever" when referring to it. Isn't it easier to just remember the defining SNP name R-M269?

For example, if you are R-L21+, then according to Family Tree DNA's Haplotree, you are R1b1a2a1a1b4, the ISOGG 2011 Haplogroup Tree's name for it. At 23andMe, you are R1b1b2a1a2f in agreement with the 2010 ISOGG Haplogroup Tree. If you tested in 2008, you might still think you are R1b1b2a1b6.  On ISOGG's 2011 Haplogroup Tree , L21+ was R1b1a2a1a1b4, but on ISOGG's 2012 Haplotree, you are R1b1a2a1a1b3. Apparently, R1b1a2a1a1b4 is now referring to L238/S182! I mean, really, how can anyone keep track? (Ah, for the days of the simple little R tree.) I don't know how our ISOGG Haplogroup Tree Committee* does it anymore! Apparently, the academics are getting tired of it too and it's just going to get worse when results from Geno 2.0 start rolling in with LOTS of new SNPs and subclades being defined.

Take look at the history of R-M222, the "Ui Niall Subclade", on just the ISOGG SNP Tree:
2007 = R1b1c7
2008 = R1b1b2a1b6b
2009 = R1b1b2a1a2f2
2010 = R1b1b2a1a2f2
2011 = R1b1a2a1a1b4b
2012 = R1b1a2a1a1b3a1a1

Reportedly, Geno 2.0 will define at least three new subclades beneath M222, but I hear it may be more. Do you think those subclade names might get even longer?









The R Haplogroup Tree is definitely the worst, but the problem is starting to affect other haplogroups too. At FTDNA, my dad is I2b1. Same at 23andMe. Sounds simple, right? Not anymore! The subclade name was recently changed to I2a2a on the ISOGG 2012 tree. I am so confused! This was one subclade name that I felt very comfortable with. I think I will just learn to call it I-M223 from now on. (I'll just ignore that his brother recently tested Z2062+, which isn't even on any of the trees yet!)

Actually, there is some rhyme or reason to these discrepancies, so let me share it for those of you who have no idea what I am writing about. FTDNA last updated their Y haplogroup tree in 2010 and 23andMe in 2011.*  So, they are going by the subclade names that were recognized at those times. In contrast, the ISOGG Haplogroup Tree has been updated over 60 times JUST THIS YEAR! Every time a new SNP is discovered that is upstream of a known SNP (which is happening faster and faster all the time), it has to be inserted into the tree, thus changing the subclade naming pattern. This is why it is so much simpler to just learn the Terminal SNP label.

The ISOGG Haplogroup Tree is a tremendous resource that anyone who is doing Y-DNA research should be utilizing. It helps to keep things straight by giving the various names of the SNPs that are being used by different companies and labs. When two or more SNPs are identical, meaning that they are on the same place on the Y-haplogroup tree with the same mutation, ISOGG shows the names in a series punctuated by "/". For example, let's look at M173/P241/Page29. M173 comes from Peter Underhill's lab at Stanford; P241 comes from Michael Hammer's lab at the University of Arizona and Page29 comes from the Page, Whitehead Institute for Biomedical Research. They appear in academic publications with these names and ISOGG lets you know that they are identical SNPs. That way, if you are Googling or looking for academic papers about your SNP, you know to try those ones too.

Going back to my first example of R-L21+; Wikipedia states, "R1b1a2a1a1b4 (R-L21) is defined by the presence of the marker L21, also referred to as M529 and S145." The label L21 comes from Thomas Krahn's FTDNA lab in Houston, M529 comes from Peter Underhill's lab at Stanford and S145 comes from Jim Wilson's lab at University of Edinburgh. ISOGG shows the SNP as L21/M529/S145. The bottom line is that if you test L21, M529 or S145 at any company, your assignment is in an identical place on the Y-DNA tree, so the subclade name is not the significant factor, the SNP name is.

Of course, with so many new SNPs being discovered and assigned to the tree, there will likely be a certain amount of continuing confusion among those of us doing Y-DNA research for the time being, but I hope you will all consider joining me in taking the next step forward in the evolution of Y-DNA research in genetic genealogy and stop trying to remember those mind-bending sublcade names! And, while we're at it, let's give our ISOGG Y-Haplotree Committee a well-deserved virtual pat on the back too!



*History:
The Y Chromosome Consortium (YCC) is a cooperative association of geneticists, led by Dr. Michael Hammer, who first published the paper in 2002, "A Nomenclature System for the Tree of Human Y-Chromosomal Binary Haplogroups", introducing the modern haplogroup nomenclature of Y-DNA. The tree was subsequently revised in 2003 by Mark A. Jobling and Chris Tyler-Smith in another paper, "The human Y chromosome: an evolutionary marker comes of age". Next, Family Tree DNA created the 2005 Y-Chromosome Phylogenetic Tree, which was the first online tree and only available to their customers. Soon thereafter, ISOGG created the first public online tree in 2006.  Tatiana Karafet of Dr. Hammer's lab (and others) published a paper further refining the Y chromosome tree in 2008, "New binary polymorphisms reshape and increase resolution of the human Y chromosomal haplogroup tree". As a result, both FTDNA and ISOGG updated their trees at that time. Then in 2010, FTDNA came out with a YCC-sanctioned tree which was distributed at the FTDNA conference and, as a result, ISOGG promptly did a major update to stay in alignment with the YCC.  Since then, no updates have come from the YCC. Undaunted, the ISOGG Y Haplogroup Tree Committee has continued to add information as it becomes available from various sources and is now the most up-to-date source of this information.  In November 2011 at the FTDNA Project Administrator's Conference, Spencer Wells of National Geographic, Michael Hammer of the University of Arizona, Thomas Krahn and Bennett Greenspan of FTDNA and Alice Fairhurst of ISOGG, reportedly agreed to all stay in alignment with the most current Y-DNA nomenclature to the best of their abilities. As always, there is new research that has not yet become public. As it is released, ISOGG will again align its tree with the most current information and will continue to add updates as they become available. With the upcoming launch of Geno 2.0, the ISOGG Committee will have their work cut out for them! Current ISOGG members who work with the tree and deserve our great appreciation are: Coordinator: Alice Fairhurst. Design team: Tanmoy Bhattacharya, Tom Hutchison, Richard Kenyon, Doug McDonald. Content experts: Abdulaziz Ali, Whit Athey, Ray H. Banks, Katherine Hope Borges, Aaron R. Brown, Phil Goff, Gareth Henson, Tim Janzen, Bob May, Eugene Matyushonok, Lawrence Mayka, Charles Moore, Ana Oquendo Pabon, Marja Pirttivaara, David Reynolds, Bonnie Schrack, Vince Tilroe, Aaron Salles Torres, Steve Trangsrud, Ann Turner and David Wilson.