Wednesday, August 27, 2014

Minor Thesis - An evidence-based Android cache forensics model

Please have a look at minor thesis submitted for my masters degree.

Thesis Abstract

Android is the most popular and widely used mobile operating systems. Although Android is one of the most actively researched area in the field of mobile forensics, analysis of Android caches is an understudied research topic – the focus of this thesis. Due to the diversity of caches and the developer’s heavy reliance on third-party libraries, this thesis proposes a cache taxonomy based on its usage, as the key to investigating Android caches is to first classify and identify them. This helps to ensure the choice of appropriate tool(s) to extract potential evidential data. A systematic process to forensically extract, analyse and investigate Android caches is proposed, which is based on the widely accepted McKemmish (1995) forensic model. The proposed Android Cache Forensic Process, the primary contribution of this thesis, is validated using nearly 100 popular apps. Previously unknown cache formats are decoded and several undocumented cache formats used commonly by Android apps are documented. Based on the findings, an Android Cache Viewer prototype is developed which is the secondary contribution of this thesis. This working prototype, as demonstrated in this thesis, is able to successfully decode Android caches and display the contents in a user friendly manner.

Source Code at GitHub.
License: MIT

Tuesday, June 10, 2014

$100 Off Coupon for Big Y Tests

I received a coupon from Family Tree DNA which allows a $100 off for Big-Y test. Unfortunately, I don't intend to do any more Big-Y tests this year. Hence, I am posting my coupon code in my blog. Please note that this coupon code can only be used once - which makes it first come first serve.

The Coupon Code is FDS140876. You can order Big-Y from FamilyTreeDNA website and follow the pink banner.

Monday, June 9, 2014

23andMe V4 not compatible with some Autosomal Genetic Genealogy Tools!

I got several complaints that 23andMe doesn't work with some of genetic genealogy tools. So, I went ahead and investigated why. I then learnt about the new V4 chip from 23andMe. After reading several forum posts and my own personal investigation, I found the following: Even though 23andMe V4 has around 596869 SNPs, out of 714533 SNPs in Family Tree DNA, only 310690 SNPs matches (for Chr 1-22 and X). So, to compare a V4 with FTDNA one must assume 403843 SNPs as matching which will give very inaccurate results and thus making it incompatible for doing any reasonable autosomal comparison. This may be the reason why FTDNA does not allow V4 transfers into their database. Hence, I regret to say that 23andMe V4 will not be compatible with the below Genetic Genealogy tools for now.

Genetic Genealogy Tools affected:

Monday, May 19, 2014

mt-Tree Mutation Timeline

Irrespective of how long does each mutation takes, I always wanted a view of each haplogroup based on number of mutations. Why do I want that? That's because, when a haplogroup or subclade is having so many defining mutations, then, the other branches from the ancestor of that clade just wiped off, leaving behind just this maternal lineage which appears on the tree. In other words, when a haplogroup or subclade is having many defining mutations, it means, a major war, invasion, natural disaster or holocaust like events. If a haplogroup or clade is having so many sister branches, then those are peaceful times or having a population explotion when no lineages are wiped off. I made this view first for Y-DNA. Then, I decided why not for mtDNA?

Below is a quick view of how the text file will look. The first column is the number of mutations from mt-Eve and the tree haplogroup width is relative to the number of defining mutations - thus giving a visual timeline.

The entire mt-Tree (based on 19 Feb 2014 on mtDNA tree Build 16) is done. You can download it from here.

Sunday, May 18, 2014

Y-Tree Mutation Timeline

Irrespective of how long does each mutation takes, I always wanted a view of each haplogroup based on number of mutations. Why do I want that? That's because, when a haplogroup or subclade is having so many defining mutations, then, the other branches from the ancestor of that clade just wiped off, leaving behind just this lineage which appears on the tree. In other words, when a haplogroup or subclade is having many defining mutations, it means, a major war, invasion, natural disaster or holocaust like events. If a haplogroup or clade is having so many sister branches, then those are peaceful times or having a population explotion when no lineages are wiped off.

Take a look at the below Y-Tree graphed based on mutation timeline.

As you can see, A0-T branches out after 30 mutations. It means, from Y-Adam till 30 mutations of A0-T, only 2 lineages survived. At this time or during this time, some disaster might have happened, that wiped out all other sister branches. Again after 33 mutations from A0, both A00 and A0 sister clades are wiped out exactly at the same time. Similarly for other clades.

Similar to the above diagram, the entire Y-Tree (based on 10-Marth 2014 on ISOGG Tree) is also done. You can download it from here.

Through this, we can trace black death, invasions and battles, severe famines etc.

Example: Black Death





The yellow line signifies Black death, as there are numerous branches at that line, signifying a population explosion from a few lineages, which means, other lineages were wiped out.

I feel something wrong with G because, if 281 mutations are defining it, then it means, there was a catastrophic large scale disaster where it completely wiped out it's sister clades, and only 1 lineage survived.

Please comment and let me know if this model can be used to trace catastrophic events like natural disasters, invasions and battles, holocaust like events etc where most of the population got wiped out leaving behind a a very few lineages.

Edit: I got some confirmation that G indeed has so many defining mutations - exact comment was their ancestor slept over uranium deposit :) and R has minimum number of mutations from Y-Adam confirmed from various sources.

Saturday, May 17, 2014

Male lineages of Dravidians

After pains takingly mapping the male Y-haplogroups for Indians with historical events using frequency distribution and heat map, I was finally able to come up with the below Y-chart. This male lineage explains the history of Tamils / Dravidian people and their admixture.

The above chart is self explanatory which is based on ISOGG Y-Tree. To summarize, modern Dravidian people (including Tamils, Malayalees Telugus and Kannadas) are an admixture from:
  • Army of Alexander (J-Z2432) - 300 BC,
  • Indus Valley Civilization (L-M20) - 1500 BC,
  • Native Americans (Q-Y2659*) - 1500 BC,
  • Europe (R1a1a1b2), - 1600 AD,
  • Central Asia (R2) - 1500 BC
  • Indigenous people - (H3) - beyond 2000 BC
This is consistent with me and my relative's Y-DNA results, being a Tamil. While I am H3, my maternal grandfather is R2 and my father-in-law is L.

Sunday, April 27, 2014

Why Evolution doesn't fit the Quantum World: The Observer Effect

Many evolutionists and evolutionary biologists alike may have heard of a century old experiment called Double Slit Experiment. This is a high school experiment that proves that light can behave as a wave and a particle. But not all experiment results were not discussed in detailed in high school. What is missed could change the way everyone think what reality itself is. It's called the observer effect.

So, what's the big deal about Double Slit Experiment - the Observer Effect? Below is a quick video and I recommend you to watch it.

In short, light behaves like a wave as long as you don't look at it. If you look at it, it behaves like a particle. The very act of observing makes light to behave like a particle. No one have a damn clue why light is behaving like a particle and a wave, and the wave equation is actually just a fluke. This experiment actually lays down the corner stone of the weird world of quantum mechanics. Going further into quantum mechanics, not all equations agree on a single universe. So, there was a multiverse proposed. While multiverse is gaining popularity, it is way to crazy to begin with because, the number of multiple universes that exists are the number of possibilities that can happen even at the electron level. The only other remaining explanation is the Copenhagen interpretation. Without going into details, it simply means, the knowledge of the observer collapses the wave function. So, what does it really mean and what does knowledge of something have anything to do with a particle? There is yet a good explanation not discussed at all - zeroverse. It is like nobody wants to talk about it, even though it is very obvious. It's a simple concept that the entire world itself is a simulation. Why do I say that? I will take you through that experiment again: We believe that if we don't watch something, it is still there. This is the classical view. But in reality, deep within microscopic quantum world, all particles exists as a probability - it isn't there and it is also there. Which means, a particle can exist and cannot exist at the same time. However, the moment when someone watches it. the wave function collapses and becomes a particle. This is a solid scientific proof that we aren't just materials of random chemicals but our very consciousness and the knowledge of a particle's state collapses it's probability. This cannot happen if we are evolved from a single celled organism. This can only happen if we are inside a simulated world.

I suggest everyone to watch the Google Tech Talks: The Quantum Conspiracy: What Popularizers of QM Don't Want You to Know.

Hence, based on double slit experiment - observer effect, humans evolving from a single cell to a monkey and finally to humans is simply non-sense. It also doesn't make any sense why something had to evolve in a simulated environment when humans who simply by observing can collapses the wave function and the way particles behave. Please note that double slit experiment - observer effect for quantum wierdness is not just for lights or electrons. Recently, molecules too produces interference pattern. (Ref: Largest Molecules Yet Behave Like Waves in Quantum Double-Slit Experiment)

Wednesday, April 9, 2014

Big-Y - Processing .BAM files

After receiving my Big-Y results, and getting the interpretation from YFull, I not only learnt that it contains significant mtDNA, but also contains autosomal data. Below are the commands I used to extract information.


samtools sort bigy.bam bigy_sorted
samtools index bigy_sorted.bam
samtools faidx ucsc.hg19.fasta
java -Xmx2g -jar ~/picard-tools/CreateSequenceDictionary.jar  R=ucsc.hg19.fasta O=ucsc.hg19.dict
java -Xmx2g -jar ~/GATK/GenomeAnalysisTK.jar -T RealignerTargetCreator -R ucsc.hg19.fasta -I bigy_sorted.bam  -o bigy.intervals
java -Xmx2g -jar ~/GATK/GenomeAnalysisTK.jar -T IndelRealigner -R ucsc.hg19.fasta -I bigy_sorted.bam -targetIntervals bigy.intervals -o bigy_sorted_realigned.bam
samtools index bigy_sorted_realigned.bam
java -Xmx2g -jar ~/GATK/GenomeAnalysisTK.jar -l INFO -R ucsc.hg19.fasta -T UnifiedGenotyper -I bigy_sorted_realigned.bam -rf BadCigar -o bigy_out.vcf --output_mode EMIT_ALL_CONFIDENT_SITES

Finally, you can use the tool, BigY-FTDNA-VCF to extract Autosomal, X, Y and mtDNA from VCF to FTDNA format which is familiar to most genetic genealogists.

Software Used:


  • Several internet forums.

Automated Tool:

I made an automated tool Big-Y BAM Analysis Tool to automatically convert .BAM file to files that are familiar among genetic genealogists. It is designed to work on a normal Windows PC (supports only 64 bit PC). Depending on the speed of your computer, the process may take from 4 to 8 hours. You can download it here.

Saturday, March 29, 2014

Big Y - YFull - Y-Chr Sequence Interpretation Service

I bought Big Y during the initial sale in November 2013 and around mid-March 2014, I received my Big Y results. After contacting FTDNA helpdesk, I received by Big Y .BAM files and I uploaded it to YFull to get interpreted. Today, I received my results from them. I am going to share what I received. This will help you understand its valuable service and encourage you to their service.

Order Big Y

If you haven't ordered Big Y yet, I recommend ordering it from FamilyTreeDNA website. I believe it is currently only for existing customers and can be order by clicking upgrade button at the top right corner.

Get the .BAM file

After getting the Big Y results, I recommend you to install the Big Y AddOn for Google Chrome which helps you to download and plot/mark on the latest Y-Tree. Getting the .BAM file must be requested from . Please note that, .BAM file will be around 1 GB in size and it may take a while to download.

Ordering - YFull Interpretation

There seems to be two interpretation service, one for researchers and the other anonymous. I went for the one specified for researchers.

Once you click order now, you get the following form:

Fill in your details and click submit.

Results - YFull Interpretation

After a week, I get login details for YFull and I can login to see my results. Below screenshot is when I login into YFull.

Homepage of after login

Haplogroup and SNPs

Below are the results you will get for positive, negative and ambiguous SNP results.

Positive SNPs

Negative SNPs

Ambiguous SNPs
You will also view your terminal haplogroup results at the top of the screen which is based on ISOGG v9.29 as on 2 March 2014 and YFull Experimental YTree v2.10

Y-Haplogroup and Terminal SNP

You can also click the download CSV to download the results in your hard disk. This download includes all positive, negative and ambiguous SNPs.

STR results

You can view all the 481 STR results.

Y-STR results
You can also filter by 12, 25, 37, 67 and 111 Y-STR markers. For example, below screenshot is filtered by the Y-STR 111 markers.

Y-STR filtered by 111 markers
There are some STR with 'Loci is not available' and some with values but greyed out as 'Uncertain'. If you had already done any Y-STR test less than 111 markers previously, then this result is an excellent value add and fill in the missing gaps.

You can also click the download CSV to download the results in your hard disk.

Private SNPs

There are Best qual, Acceptable qual, Unreliable qual, Low qual, One reading! and INDELs tabs.

Private SNPs

Browse raw data

Apart from haplogroup, SNPs and STRs, you can also browse the .BAM by position.

Browse raw data

Check SNPs

You can also check a particular SNP. For example, if I key-in M89, my previous terminal SNP, I get the below result.

Check SNPs


You can also view the statistics for the detailed results based on your .BAM file.
Raw Data Statistics

Known SNPs Statistics

STR Statistics

Private SNPs Statistics

Raw Mt Statistics

Full Mt

What shocked me by total surprise is that, the .BAM file which is supposed to be for Y-DNA, also contains my Mt-DNA results!
You can also click the download RSRS or rCRS CSV to download the results in either RSRS or rCRS format in your hard disk.


Finally, there are groups where you can view and compare results of others belonging to the same haplogroup.
Available Groups
I don't see my terminal SNP to join. For now, there are only 6 groups (as of March 2014) and the largest group is R1a of 102 members and I believe it will grow slowly as time goes by.

Y-STR Comparison

SNP comparison


You can download my YFull Interpretation results here to have a look on what you can get.


Y-Full Interpretation service is certainly a must for every Big Y customer, and I strongly recommend it. If you are more focused on paternal ancestry line, I strongly recommend going for Big Y and use Y Full interpretation service.

Big Y currently costs US$ 695 and includes most of mtDNA, 400+ Y-STR and 50000 Y-SNPs. If we split the costs, full mtDNA US$ 199, 111 Y-STR US$ 359 and only individual Y-SNP tests are provided on request which is extremely costly, like US$ 39 per SNP. Hence, confirming just 5 individual SNPs and doing the other available tests will easily exceed the cost of Big Y.

I am grateful to YFull team for providing an excellent service free of cost.

Note: The costs are as on 29-Mar-2014 without any discounts or offers.

Wednesday, March 26, 2014

Genealogy website moved from Wordpress to Blogger

After some deep thoughts over the past few weeks, I finally decided to change Wordpress used in Genetic Genealogy Tools website to Blogger and Redhat's Openshift hosting to Google. This change helps to reduce server maintenance activities like constant upgrades and security fixes. I also made the Google Drive, which I originally used as an alternative download method as primary. This helps to streamline things and save more time instead of redundant upload.

I removed some of the obsolete tools from website and links that were actually blogs in tools section. Even though I removed from the website, all obsolete tools and their source code can be accessed from the Google Drive.

From a visitor's perspective, nothing has really changed from the old website except the look and feel.