Wednesday, March 26, 2008

The limits of unlinked SNPs for learning about demography

The best way to learn about demography from population genetic data is to look at multiple unlinked regions (a common theme over at the evolgen blog). The distribution of frequencies in a populations of neutral alleles at SNPs (the site frequency spectrum), is informative about population history. For example an excess of low frequency mutations is consistent with recent population growth, as the increase in population size introduces new mutations but these mutations have not yet had time to drift to higher frequencies.

A number of papers have made use of the frequency spectrum of unlinked SNPs to learn about demography. A technical but elegant article by Myers et al shows that while informative about demography, the site frequency spectrum at unlinked SNPs can not help you chose between certain demographic histories. This is not a question of imperfect knowledge of the site frequency spectrum (which more data would solve) but because for any particular demographic model, as Myers et al formally show, there are a large family of demographic histories that can give rise to the same site frequency spectrum. They explained: 'Informally, changes in population size at some past time are canceled out by other changes in the opposite direction'. I think that this lack of information comes from the fact that each unlinked SNP only tells you about the placement of a single mutation on the genealogy of the population at that site, and over sites you learn about the expected amount of time in different parts of the genealogy (loosely worded, as there are in fact a set of genealogies). It is therefore perhaps not surprising in hindsight that these data are not sufficient to learn everything about population size changes.

This problem can be circumvented by using patterns of linkage disequilibrium within genomic regions to add additional information about the patterns of coalescent trees across the genome. I've often wondered along similar lines about how we can learn about population histories from population genetic data (even a whole genome's worth) and what the fundamental limits might be.

Reference:
Myers S, Fefferman C, Patterson N.
Can one learn history from the allelic spectrum?
Theor Popul Biol. 2008 Jan 3

No comments: