(For Advanced Y-DNA Researchers)
Dr. James Wilson from BritainsDNA has sent me a list of the alternate names for the Chromo2 Y-SNPs and has given me permission to post it for public access with his comments, as follows:
Please find attached a list of alternate names for chromo2 Y SNPs...This is based on comparisons some months ago now,
although a few SNPs have been updated in the meantime...Where there is no alternative name for an S SNP it means it was not
listed/named in any compendium, browser or database I had available when
this file was put together and is not available in any other product as
far as I am aware (of course apart from a complete Y chromosome
sequence).
Note that all the SNPs on this list manufactured on the chip, but a
small proportion do not give good genotyping clusters; I haven't had
time to clean them out. There are also a small number of SNPs not on
this list eg S28, S250, which we Taqman in the appropriate samples as
Illumina appear to have removed them from the design and they have no
proxies on the chip or none known.
In due course I also intend to share the genome co-ordinates to allow
comparisons with whole Y chromosome sequences, despite this being a case
of handing larger competitors the fruits of our investment. At present I
have started to do that for individual SNPs that are queried on a
case-by-case basis, as I don't as yet have the permissions of all of the
sequenced individuals to hand out their SNPs in this way.
You can find the list here. (Make sure you download the entire list and not just what is visible through Google Docs.)
Thanks, Jim!
Discover the fascinating world of genetic genealogy! Written for the non-scientist, YGG is a source of unbiased news on the major genealogy DNA testing companies. Written by CeCe Moore, an investigative genetic genealogist and television consultant.
Wednesday, November 20, 2013
Wednesday, November 13, 2013
News from Family Tree DNA: "Big Y" Sequencing, Conference and Holiday Sale
Family Tree DNA's 9th Annual International Conference on Genetic Genealogy was held this past weekend in Houston. It was clear to those in attendance that, with the recent acquisition of Arpeggi, things are changing over at Gene by Gene. There has been an infusion of "new blood" into the company and with it has come new enthusiasm, resources and promise. The team was very open to hearing the community's needs and priorities, with the new staff listening in on our "roundtable" discussions where attendees were encouraged to share their ideas. As a result, they have promised the genetic genealogy community some of our most requested features in the near future.
Y-DNA Sequencing
These are exciting times for our Y-DNA citizen scientists! Following right on the heels of the delivery of the first results from Full Genomes, Family Tree DNA announced their new Big Y next gen sequencing product at the conference. Justin Petrone covered this development in his article in BioArray News yesterday as have a number of genetic genealogy bloggers.
In all of this excitement there has been a lot of discussion regarding what these competing products will and will not deliver. For the answers to many of these questions, we will have to wait until the first Big Y results start to be returned in approximately 10-12 weeks. However, one of the most pervasive concerns can be addressed now.
Throughout the genetic genealogy community for the last couple of days, there has been speculation that Family Tree DNA's Big Y product will not include raw data downloads. I found this very difficult to believe with FTDNA's track record of transparency, so I asked Gene by Gene's new Chief Scientific Officer Dr. David Mittelman for clarification on this matter. He confirmed that, while the Big Y results will consist of SNP calls, the raw data will be available for those who want it - just as it is for the company's exome and whole genome products. He added, "We will want to set up some infrastructure to support downloading these big raw files and we will need to clarify for customers that our CSRs are obviously not able to offer advice or support on how to use them."
I am relieved (although not surprised) to hear this, as I am sure many of you are. I appreciate Dr. Mittelman's quick response to my inquiry even though he was traveling.
The Big Y is being offered at a discount through November 30, 2013 for $495 and will increase to $695 after that. Previous "Walk Through the Y" customers will receive an additional $50 discount. (Various project admins have reported a lot of orders, so the offer appears to be a success already.)
Holiday Sale
The Family Tree DNA holiday sale has begun and is running through Dec 31st. Any customer whose order that includes a Family Finder autosomal DNA test will receive a $100 Restaurant.com gift certificate. This sale includes new orders, upgrades and 23andMe/AncestryDNA transfers:
- Y-37 for $119 (reg. $169)
- Y-67 for $189 (reg. $268)
- Y-111 for $289 (reg. $359)
- mtFull for $169 (reg. $199)
- Family Finder for $99 (includes a free $100 Restaurant.com gift certificate)
- Family Finder + Y-37 for $218 (reg. $268) inc. $100 Restaurant.com gift certificate
- Family Finder + Y- 67 for $288 (reg. $367) inc. $100 Restaurant.com gift certificate
- Family Finder + mtFull for $268 (reg. $298) inc. $100 Restaurant.com gift certificate
- Y-37 + mtFull for $288 (reg. $366)
- Y-67 + mtFull for $358 (reg. $457)
- Comprehensive for $457 (reg. $566) inc. $100 Restaurant.com gift certificate
- Autosomal DNA Transfer for $49 (Reg $69)
- Y-Refine 12 to 37 for $69 (reg. $109)
- Y-Refine 12 to 67 for $148 (reg. $319)
- Y-Refine 25 to 37 for $35 (reg. $59)
- Y-Refine 25 to 67 for $114 (reg. $59)
- Y-Refine 37 to 67 for $79 (reg. $109)
- Y-Refine 37 to 111 for $188 (reg. $220)
- Y-Refine 67 to 111 for $109 (reg. $129)
- mtHVR1 to Mega for $149 (reg. $169)
Conference Coverage
When I first started blogging the FTDNA conference, I was largely alone. Now with the excellent and thorough conference coverage by other bloggers and on Twitter, it is no longer necessary for me to give a blow-by-blow account here. Thanks to Debbie Kennett for compiling a comprehensive list of conference posts on her blog.
Monday, November 4, 2013
Upcoming Presentations: November 8th and 14th
I am speaking twice in the next week and a half - once locally and once in Austin, Texas. I hope to see some of you there! Details follow:
- November 8, 2013 - Adoption Knowledge Affiliates Conference, Austin, TX.
Due to recent technological advances, adoptees and individuals without knowledge of their genealogy are increasingly turning to DNA testing to discover their genetic roots. The tight-knit AdoptionDNA community has been in the forefront of innovation in this area, helping many adoptees rediscover their birth families and ancestral origins. Attendees will learn some of the techniques and tools successfully used.
- November 14, 2013 - Diamond Gateway Women's Club at 6:30pm, Penasquitos, CA. Reservations required, contact Dael at 619-252-0804 or daelnk612@yahoo.com. There is a $5 fee. Mt. Carmel Church of the Nazarene, 10060 Carmel Mt. Rd., San Diego, 92129.
Sunday, November 3, 2013
First Look at the Full Genomes Y-Sequencing Results from Itaï Perez
(**Warning - Advanced Content**)
Guest Blogger, Itaï Perez, reviews his Full Genome results for my readers:
This one is easy to understand. It is a list
of all Private SNPs discovered in my sequencing. Here is the description found in the beginning
of the file (which I removed when converting to Excel).
I also received a small manual describing this file and how to use it:
Guest Blogger, Itaï Perez, reviews his Full Genome results for my readers:
For those wondering what the results from Full Genomes
look like, here’s a first look.
After a long wait while my kit was sequenced
and analysed, I finally received an email from Full Genomes with an attached rar
archive containing 9 files.
Almost all these files are in a formatted text
format, and can easily be converted to an excel table (which I did).
Here’s the description of the 9 files, as much
as I understand it, one by one:
File #1 - PrivateSNPs
#based
on 20131001 variantCompare analysis using PGP083013.filt.pyfilt.1kGfilt.vcf and
ALL.1kG.samplelist.redo.sorted.paths.20130812.curated_pm.filt2.called.pyfilterCG2k.vcf
reference files
And here is the file itself:
The columns are SNP name, position, ancestral
and derived base. The position of these new SNPs have been removed from each image in order to give the Full Genomes team time to register and name them.
File #2 - yknot
This is the only file which is not a table.
This text file includes a tree, following my positive SNPs from Y-Adam to my
current most recent SNP, as defined in the ISOGG Y-tree.
File #3 - variantCompare
This file is more complex. Here is the
description in the beginning of the file:
#FGC
report: Analysis of Called Variants
#this
report analyzes variants called as differing from the GRCh37 reference sequence
#for
best viewing, open with tab-delimiting in a spreadsheet viewer
#reliability
flag key: no flag: over 99% likely genuine; *: over 95% likely genuine; **:
about 40% likely genuine; ***: about 10% likely genuine
#it is
strongly suggested that results analysis be restricted to variants with zero or
one asterisks
#citations
for reference data include:
# 1000
Genomes Project: An integrated map of genetic variation from 1,092 human
genomes, McVean et al., Nature 491, 56-65 (01 November 2012) doi:
10.11038/nature11632
# Personal
Genome Project: Ball, Madeleine P., et al. A public resource facilitating
clinical use of genomes. Proceedings of the National Academy of Sciences 109.30
(2012): 11920-11927. http://www.pnas.org/content/109/30/11920.long
# A
High-Coverage Genome Sequence from an Archaic Denisovan Individual, Meyer et
al. Science 338, 222-226 doi:10.1126/science.1224344
# R.
Drmanac, et. al. Science 327(5961), 78. [DOI: 10.1126/science.1181498]
GRCh37 is the Genome Reference Consortium
human genome (build 37). I guess it is a reference genome similar to CRS or
RSRS for mtDNA. This table lists all the SNPs which vary from
this reference. The fields are position, base change, rsID,
SNP name, reliability and a list of the reference genomes which share this
change. There are four successive sections:
shared SNPs, private SNPs, shared INDELs and pricate INDELs.
Here’s how the file looks:
I also received a small manual describing this file and how to use it:
File #4 - strcall203.lobystr203report
This table contains the list of all STRs.
Here’s the description in the file :
#FGC
Y-STR report generated based on lobSTR pre-v2.0.3 (sourceforge git revision
34534b) processing
#lobSTR
citation: Gymrek M, Golan D, Rosset S, & Erlich Y. lobSTR: A short tandem
repeat profiler for personal genomes. Genome Research. 2012 April 22.
#Notes:
#Repeat
counts reported according to lobSTR standards; conversion required in certain
cases to produce results based on other counting standards
#chrY
coordinates based on hg19 / b37 reference sequence
#Marker
conversions to FTDNA standards for DYS448, DYS449, DYS607, DYS576, DYS511,
DYS640, and DYS485 are provisional
#Marker
results known to be unreliable include: DYS413a/b, DYS490, DYS572, DYS726,
DYS534, DYS446, and DYS487
#default
lobSTR database has been augmented with results for DYS540, DYS712, DYS593,
DYS715, DYS513, DYS561, DYS497, DYS510, DYF385.1, and DYF385.2, which should be
treated as provisional
#Only
two copies of DYS464 and DYF371 are called here; fully-spanning read details
can provide insight into additional copies
#DYF371
includes DYS425
#NR =
not reported / no reads
#NA =
not available
#call
confidence: 1 corresponds to highest confidence, 0 corresponds to lowest
confidence; results with call confidence below 0.2 should be considered very
speculative
#conflict
flags: ? = conflicting fully-spanning reads; * = conflicting partially-spanning
reads; % = het result in diploid calling for marker not recognized as
multicopy; & = not called in diploid calling
#read
details: Format is [repeat count]|[number of reads supporting given repeat
count], with different counts separated by ';'. In the case of multicopy
markers like DYS464, the fully-spanning read details can be used to determine
repeat counts for additional copies
And here is what the file looks like:
Now this gets very technical and I don’t
understand everything, but from what I can figure out, first we have the STR
name and the estimated result, and then follows information
explaining how this result was found and how sure the program is of it.
File #5 - strcall203.lobystr203report ftdna
This table also lists the STRs, but in a much
simpler form. You simply have the name and the results, and the STRs are in the
order they are found at Family Tree DNA.
The description in the file is:
#Marker
conversions to FTDNA standards for DYS448, DYS449, DYS607, DYS576, DYS511,
DYS640, and DYS485 are provisional
#see
main Y-STR report for further information regarding reliability, etc.
And here’s the table:
File #6 - mttype.RSRS.MT
This table gives the mtDNA results in RSRS
format. It gives for each SNP the position and the ancestral and derived
result.
Here’s the description:
#FGC
mtDNA report
#Variants
with respect to RSRS
And here is the file:
File #7 - mttype.rCRS.MT
This one is exactly the same, but using the
CRS format.
#FGC
mtDNA report
#Variants
with respect to rCRS
File #8 - haplogroupCompare
This table lists my SNPs and compares them to
some reference results from my haplogroup or close to it. It quite similar to
the variantCompare file. The fields are position, base change, rsID,
SNP name, reliability and the reference results mine is compared to. There are
two successive sections: shared SNPs and private SNPs.
Here’s the description :
#FGC
report: Detailed Analysis of Called SNPs
#refer
to Analysis of Called Variants for citations and other details
#in the
reporting below, it is assumed that the reference allele is ancestral
("-") and the sample allele is derived ("+");
"x"=ambiguous and "?"=no-read/no-call
#note
that this report uses a different, simplified variant calling approach from
that used in the Analysis of Called Variants report, so results may differ,
especially for less-reliable variants
Haplogroups
in the neighborhood of G-L91 being considered; includes: G-L91;G-L166;G-M286
And here’s the file:
File #9 - gtype
This one is also a bit complex. It lists the
Y-SNPs and seems to detail how the results were determined.
Here’s what it looks like:
This ends the description of these nine analysis
files. Note that I am still waiting for access to my results on the website and
to my sequencing raw file. If you are interested I’ll write another article to
show it to you then.
Thanks Itaï!
These tools were developed by Dr. Greg Magoon with the supervision of Justin Loe. Justin tells us "these are not final versions and will be
upgraded to a more user-friendly presentation by specialists in
user-interfaces."
BGI provided the sequencing services and developed the Y chromosome chip.
If you have any questions, please post them below and I will try to get them answered. I'm sure we will be seeing a lot more regarding the Full Genomes test soon...
BGI provided the sequencing services and developed the Y chromosome chip.
If you have any questions, please post them below and I will try to get them answered. I'm sure we will be seeing a lot more regarding the Full Genomes test soon...
Subscribe to:
Posts (Atom)