Tuesday, November 25, 2014

Ancient Hungarian Neolithic (NE1) DNA matches living people!

The Great Hungarian Plain was a crossroads of cultural transformations that have shaped European prehistory. The authors had analysed a 5,000-year transect of human genomes, sampled from petrous bones giving consistently excellent endogenous DNA yields, from 13 Hungarian Neolithic, Copper, Bronze and Iron Age burials including two to high (~22 × ) and seven to ~1 × coverage, to investigate the impact of these on Europe’s genetic landscape. I converted the raw data of NE1 from Polgár-Ferenci-hát site in Hungary into formats familiar to genetic genealogists and uploaded here. I also filtered with SNPs tested by DNA testing companies like FTDNA, 23andMe and Ancestry and uploaded to GEDmatch as kit# F999937. After batch processing, NE1 has significant matches with living people.

Matches on GEDmatch


Top NE1 matches in GEDmatch

Matches in FTDNA


Top FTDNA matches for NE1.

Related Blog:

Monday, November 24, 2014

Autosomal Pedigree Creator Tutorial

This is a tool to automatically create pedigree trees based on segment matches from a set of autosomal files. This tool will also let you know how a segment match of an unrelated genetic match is related and through whom.

Project Page: Autosomal Pedigree Creator

Getting Started

Folder Structure
Download the Autosomal Pedigree Creator.zip file from the website which is usually less than 1 Mb and extracting it gives you the following files and folders.
  • bin – contains bare minimum Graphviz binaries required to convert a .gv dot to PNG image file.
  • data – intermediate folder
  • ibd – intermediate folder
  • tmp – intermediate folder
  • Autosomal Pedigree Creator.exe – Executable
  • README.txt – Readme file giving a quick overview of the software just in case you haven’t looked at the website.

Kit Preparation

In order to use this tool, some basic preparation must be done. It is just renaming the files with humanly readable filenames. Please don’t change the file extensions. Please use only alphanumeric names.
E.g.,
  • 264652-autosomal-o37-results.csv.gz can be renamed to Felix.gz
  • 264652-autosomal-o37-results.csv can be renamed to Felix
  • genome_v3_Full_20131006120000.zip can be renamed to Felix.zip
Once renamed, place all the renamed kits into a folder. This folder will be selected from the interface.

Note: To get best results, make sure you have the kit along with the parents’ kits as well. If you don’t have any parents and you got a blank pedigree or you got some individuals omitted, try dump all option.

User Interface




Usage of this tool is self-explanatory and below are the brief steps.
  1. Click Browse and select folder where you had placed all the prepared kits.
  2. Dump All –This option is only required when you have kits totally unrelated to each other and you want to dump every possible segment connection.
  3. Click start and the process begins. The process can go on for a few minutes to several hours depending on the number of autosomal DNA files.

Execution

The process executes sometimes for several hours. The progress seems to get stuck at 15% and then at 75%. This is not really stuck but it tries to extract as much information as possible in order to construct the tree and it does not know how far it has to go. Also, each comparison is done in parallel equal to the number of processors in your computer to accelerate the process.

Pedigree Output

When the process finishes, a PNG file called pedigree.png will automatically open which contains the tree. For some reason if the PNG file didn’t open, then you can always find the file in the root folder of Autosomal Pedigree Creator.


 The tool uses Graphviz to generate the PNG file output from a .gv dot file. The .gv file can be found inside the tmp folder as tree.gv.

Tracing the Connection

For some reason, if you want to check a connection between two common ancestors or two autosomal files, you can do so by following the below procedure.

In the pedigree output, each line is a match, the terminals are autosomal files and the 4 letter ovals are common ancestors. The mapping between these 4 letters and what it means can be found inside tmp folder in the file common_ancestors.csv which can be opened in excel.

As mentioned each arrow is a connection or a matching segment or a group of segments from a common ancestor.



XML Representation

The complete list of common ancestors and how each are related is present in the XML file atree.xml.



This file contains the common ancestor CA tag and the list of segments that match. Please note that all the sub nodes match all the segments at the parent level. Even though the root element is ADAM-EVE, its sub nodes are not automatically connected to the root. The root element is just for the sake of having a root element in XML and is not reproduced in the pedigree tree.

The XML is generated from a text file ‘atree.txt’. The XML file is simply a hierarchical representation of the text file.



Matching Segments

All matching segments can be found inside the ‘ibd’ folder. Please note ‘ibd’ is just a folder name and does not automatically mean they haven’t had recombination or Identity By Descent. However, all matching segments inside ‘ibd’ folder are compound segments.



Opening a file say, Arulraj-Chandrakumar-Esther-SathiaGnanaraj means, the segment is common across Arulraj, Chandrakumar, Esther and SathiaGnanaraj autosomal files and it represents the common ancestor for the kits involved.



Output Interpretation



You might wonder why there are some common ancestors represented as 4 chars in ovals but has only one descendant common ancestor also represented as 4 chars in ovals. The reason is because, these intermediate common ancestors do have population data or segments matching the individuals but does not match the parents. If you want to include all such matching segments from population data, you can enable ‘Dump All’ option. However, be warned that ‘Dump All’ can create a clumsy pedigree because every individual may match every common ancestor depending on how close they are related.

The above output which is closely correct, but still requires some manual intervention and adjustments to get accurate pedigree.

For the above pedigree, below are the true relations.
  • Felix (self)
  • Chandrakumar (Father)
  • Selvarani (Mother)
  • Sathia Gnanaraj (Paternal grandfather)
  • Esther (Wife)
  • Arulraj (Father in law)
There is no common ancestors between Felix and Chandrakumar (because Chandrakumar is my father). So, VLXQ name represented as a common ancestor between myself and my father is none other than my father himself. Similarly for all parent/child relations. It is not possible to automate this using computer algorithms because, a computer can only say if a relation is parent/child but it cannot find who the parent is unless it has all the required data surrounding it which is not possible or feasible all the time. Changing the parent/child relations will lead to the below modified pedigree.



As you can see, I can infer the following from the autosomal pedigree tree.
  • My wife’s tree is separate line.
  • There are three individual common ancestors giving three lines.
  • My parents are distant cousins.
The above tutorial can also be downloaded here: Autosomal Pedigree Creator.pdf 

Let me know if you find this tool useful and know what you found.

Friday, November 21, 2014

Google+ Awesome trick for the previous post images ...


Google+ automatically created an awesome trick motion image based on the previous post images which were uploaded. Just thought of sharing this ...

Previous Post: GEDmatch Archaic DNA matches

GEDmatch Archaic DNA matches

I would like to thank John Olson from GEDmatch for preparing an excellent tool to compare all ancient DNA uploaded to GEDmatch in one page. The tool allows to reduce the cM as low as 0.5. However, based on my experience and experiments, always try to use 1.5 to 2 cM and above.

Please note, if you see the page giving the exact same results even after changing the thresholds, make sure you refresh the page using F5 key (this seems to be a bug?).

While it is useful to compare your autosomal DNA with ancient DNA kits, it is also helpful to see how other ancient DNA matches with each other. For the below comparisions, I had used 2 cM.















Thursday, November 20, 2014

NE1 Ancient DNA Analysis

The Great Hungarian Plain was a crossroads of cultural transformations that have shaped European prehistory. The authors had analysed a 5,000-year transect of human genomes, sampled from petrous bones giving consistently excellent endogenous DNA yields, from 13 Hungarian Neolithic, Copper, Bronze and Iron Age burials including two to high (~22 × ) and seven to ~1 × coverage, to investigate the impact of these on Europe’s genetic landscape. I converted the raw data of NE1 from Polgár-Ferenci-hát site in Hungary into formats familiar to genetic genealogists and uploaded here. I also filtered with SNPs tested by DNA testing companies like FTDNA, 23andMe and Ancestry and uploaded to GEDmatch as kit# F999937.

Admixture

Dodecad v3

Eurogenes

MDLP


Eye Color

Eye Color


Runs of Homozygosity

RoH reveals parents of Kostenki14 are not related in their genealogical timeframe.

Mt-DNA

NE1 is a female and she belongs to mt-Haplogroup U5b2c

Archaic DNA matches

GEDmatch Archaic DNA Matches.
(Note: Browser seems to cache the image for GEDmatch's Archaic DNA matches. So, if you get same result when you change the thresholds, make sure you refresh the browser).

NE1 is so closely related to LBK. Below is at 700 SNPs / 7 cM. It has 2 matching segments at 500 SNPs / 5 cM.

NE1 and LBK.

 BR2 is also related to NE1. BR2 could possibly be a descendant of NE1.

NE1 and BR2

Ust'-Ishim could be an ancestor to NE1.

NE1 and Ust'-Ishim

Loschbour could also be an ancestor to NE1.

NE1 and Loschbour

HIrisPlex Eye and Hair Colour DNA Phenotyping

PBlueEye 0.798765
PIntermediateEye 0.078415
PBrownEye 0.12282
Full_AUC_BlueEye 0.940398
Full_AUC_IntermediateEye 0.743643
Full_AUC_BrownEye 0.94528
Numb_missingSNPs_Eye 0
Name_missingSNPs_Eye
AUC_Loss_BlueEye 0
AUC_Loss_IntermediateEye 0
AUC_Loss_BrownEye 0
PBlondHair 0.052597
PBrownHair 0.401653
PRedHair 0.000564
PBlackHair 0.545185
Full_AUC_BlondHair 0.810616
Full_AUC_BrownHair 0.751061
Full_AUC_RedHair 0.922712
Full_AUC_BlackHair 0.848114
Numb_missingSNPs_Hair 3
Name_missingSNPs_Hair rs86insA_A / Y152OCH_A / rs2228479_A
AUC_Loss_BlondHair 0.004374
AUC_Loss_BrownHair 0.002165
AUC_Loss_RedHair 0.010717
AUC_Loss_BlackHair 0.002165
PLightHair 0.154469
PDarkHair 0.845531
Full_AUC_HairShade 0.905444
Numb_missingSNPs_HairShade 1
Name_missingSNPs_HairShade rs2228479_A
AUC_Loss_HairShade 0.000133

S. Walsh, L. Chaitanya, L. Clarisse, L. Wirken, J. Draus-Barini, L. Kovatsi, H. Maeda, T. Ishikawa, T. Sijen, P. de Knijff, W. Branicki, F. Liu, M. Kayser, Developmental validation of the HIrisPlex system: DNA-based eye and hair colour prediction for forensic and anthropological usage. Forensic Science International: Genetics. Submitted. 

According to HIrisPlex, NE1 had Blue hair and Brownish/Black hair.


Kit in FTDNA Database


NE1 has matches in FTDNA. I want to thank Roberta and FTDNA for helping to unlock ancient DNA kits.

Matches in FTDNA