Sunday, 8 February 2015

Genetic genealogy needs horizontal pedigree charts

Making the most of your autosomal DNA ancestry test requires understanding some simple odds and finding a good way of visualizing how genetic match connections work.

The trick is to build a picture that fits in your brain and doesn't leave you feeling overwhelmed by a morass of potentially connecting pathways. I've got one and I'll share it with you below in the hopes that it works for you too.

The most basic, probably universal, chart for "family" looks something like this:

When visualizing "ancestry", a common approach builds on the standard family chart by adding to it vertically. This is the vertical pedigree chart, which looks something like this:

You may recognize that as the structure used by Ancestry, FamilyTreeDNA and others for tree display. The tendency for genealogy and genetic testing companies to use the vertical pedigree visualization is a damn shame.

I think it is the major limit on efficiently identifying the Most Recent Common Ancestor (MRCA) between genetic matches. You'll see why in a moment.

The alternative ancestry charting method is the horizontal pedigree chart:

Notice how:
  1. this is a much more space-efficient chart that is easy to display on a computer screen, (it's basically a table) and
  2. each column is a nice, easy to read list of all the ancestors belonging to each ancestry level in your tree.
GEDMATCH, to its credit, uses a horizontal pedigree chart, although it's not space efficient (it does not list many generations). Why am I going on about space efficiency and the benefits of listing names per generation?

Odds, that's why.

When you receive your autosomal test results, you typically get a list of 700-1000 other testers who share at least one DNA segment with you. Looking at your list of matches and the estimated relationships between you two (provided by the testing company), you'll notice that you have a handful of relatively close matches but the vast bulk of your matches, say 995 of your 1000, will be more distant than that.

Pretend, for a moment, that all the connecting relationships for the 1000 matches were already known, the average relationship across the group would probably be something like 5th or 6th cousins. So, what do you need to know in order to identify the Most Recent Common Ancestor (MRCA) between you and the vast majority of your matches -- all these people, who are, on average, your 5th cousins?

Odds are, you need to know the fourth-great-grandparents of each tester.

If you have two full fifth cousins and you take a list of the 64 fourth-great-grandparents for each, two names on both those lists will be the same.
(simulated tree)

So, in order to effectively use your test results 99.5% of the time, you need to have lists of fourth, fifth, and sixth great-grandparents to compare. Unfortunately, none of the testing companies provide an easy way of doing this*.

None provide single view horizontal pedigrees to the fourth-great-grandparent level (or beyond). Instead, the tree structures they provide for testers to add information to are difficult to access and use.

I estimate that 90% of the completed, already researched, genealogies in the testing pool are not available by clicking on a match's name. This is a massively wasted opportunity.

As this charting method shows, in terms of odds, most matches will resolve through a shared person or couple in the list of your 64, 128, or 256 "lines" (i.e. the 4th, 5th, or 6th great grandparent level of your tree -- the farther you complete your tree, the more known lines you have and the more information you have available to figure out how you relate to someone). Most people have no trouble understanding they have a maternal and paternal side, but the exponential expansion of lines to the level of their fourth-great-grandparents is not yet part of how they see the process. Unless everyone is provided with a horizontal pedigree chart to complete to the relevant levels, efforts to identify MRCAs quickly stall.

It gets trickier to identify connecting relationship if fewer names are known (on either tree) but the same principal applies: use the testing company to estimate the level of your tree and your match's tree that should contain an overlapping couple or person (half relationships can be considered by going out one farther level than the estimate predicts). If you can't find a match, look at any missing areas on either side and consider whether the DNA and the combined information from both of you provides a clue about who the missing people could be.

This is how genetic genealogy can break through brick walls.

A seven to nine generation horizontal pedigree model provides a way of easily working with a complex situation. For full fifth cousin matches there are 32 potential pathways on your side and 32 potential pathways on your match's side (because the two sides of the final complete path between you and the match will connect at a couple).While this means that there are over one thousand potential pathways to investigate (odds that can seem overwhelming) checking two reasonably complete lists of 32 pairs of fourth-great-grandparents to find a common pair is not that hard.

So, in summary: to succeed at genetic genealogy you need to have a model of your tree and your matches' trees that allows you to easily identify the overlapping ancestors, namely shared fourth-, fifth-, and sixth-great-grandparents. Horizontal pedigree charts which run at least to the fourth-great-grandparent level allow you to do that efficiently and with an awareness of what is missing. Other methods are not as easy or effective.


A second reason why genetic genealogy needs horizontal pedigree charts is substantially more obvious then the one outlined above: they can provide a spatial representation of ancestry composition. Testing companies who provide ancestry composition estimates do not provide a charting tool that reveals regional contributions to the tester's DNA, but the horizontal pedigree chart can easily do this as well:

And finally, completing such a chart would give testers something to do during the long wait between sending the kit and waiting for their results to come in.

Updated: Template - this is an excel file I use (it is bigger than the above and set up to print on 11 x 17 at a copy shop). It is also expandable -- you can copy the table into a new worksheet and then each person in the last column becomes the base person of their own table, assigning them the ahnentafel number next to their name.

*Note for clarity: Apparently AncestryDNA does have a pedigree view option (I am not sure how many generations it shows on one screen). As a Canadian, I had used AncestryDNA for haplotype testing many years ago and those accounts, deleted by the company last year, did not have a pedigree tree view (or trees, if I remember correctly). Apparently those (US, Ireland) who can order the autosomal testing do have access to this.

Updated 2015-02-09 with template (see bottom). 2015-02-10 template link updated and switched to viewable sharing as someone is editing the template with their own information. Please let me know in the comments if the viewable template cannot be downloaded, thx.

Monday, 2 February 2015

Haplotype (PART 1): What's it good for?

Not much.

I kid, but haplotype results are not good for as much as most people initially assume. A common assumption seems to be that the result reflects one half of your ancestry (your maternal or paternal "side") but in fact it represents only a minuscule amount of your overall heritage.

If you made this mistake, don't worry, so has pretty much everyone who ever received results from a commercial testing company. Haplotype results are over-hyped and tend to dominate ancestry DNA reports (for reasons I won't get into here).

With this post and a couple of follow-up blogs covering investigations I am working on, I hope that I can help you navigate through the hype, bringing haplotypes to ground in the area where they are actually useful: using their logical inheritance pattern to prove/disprove theories of relationships for which no documentation can be found.

The simplest way to understand a "haplogroup" is as a grouping of Y-DNA results or mitochondrial DNA results based on shared variants in the 59 million bases in a male's Y chromosome or or 16.6 thousand bases in anyone's mitochondrial DNA. (This is how the term haplogroup is used in genetic genealogy, biologists use it for all sorts of stuff).

Every individual tested who shares the exact same variants at key locations on the relevant chromosome is assigned the same haplotype. They all inherited these variants from ancestral lines that lead back to the same person a very, very long time ago. Likely tens of thousands of years ago, given the resolution of most basic tests (if you've used a high resolution STR marker test you probably already know this stuff so I don't feel bad about making a slight oversimplification about timelines).

Everyone in a group got their variants from the same person many thousands of years ago, and still has the same sequence of variants, because of two simple facts: Y-DNA gets passed down from father to son without recombining, and mitochondrial DNA is the non-nuclear DNA inherent in the egg cell itself (the sperm's mitochondrial DNA does not survive fertilization) so it also passes intact, but from mothers to all of their children. You, your mom, and her mom's mom's mom, etc. in a straight line of maternal relationships, all have the same mtDNA haplotype. There is a small mutation rate, but don't worry about it until you've become more expert in using haplotypes.

So, how does that type define you?

It really doesn't reflect much about your complete genetic make-up.

Categorizing 16.6 thousand base pairs out of a total of 3 billion is categorizing a measly 0.0005% of who you are, genetically. Women don't even have a Y chromosome and for men it is only about 2% of their total genetic make-up. However, as insignificant as these percentages are, these are the only groupings of DNA that can be made to reflect ancient ancestry and follow a specific logical inheritance pattern, because all your other DNA emerged from mixing events every single time one of your ancestors was conceived. So, the fact that these pieces of DNA do not change and are associated with specific people in every ancestry chart is pretty cool and can be a powerful tool in certain investigations.

So, your haplotype does not really define much about your overall genetic identity, does it tell you much about your relationships with genetic matches on autosomal tests?

I'll answer that in PART 2...