Two recent papers use of high density SNP chips to show counter intuitively that you can locate the geographic origin of a person to within a few hundred miles using only genetic data (Novembre et al., hat tip to gnxp), and that you can detect whether a person has contributed to the pooled data of a genome-wide association (GWA) scan (homer et al, hat tip to gnxp and the The Spitoon). Both of these papers testify to the power of large amounts of data for detecting very subtle signals. I think we are currently not use to how tiny effects can be multiplied across the hundreds of thousands of SNPs in these studies. It will be interesting to see what other counter intuitive results emerge, especially as we move towards whole genome resequencing. It is interesting to note that the current concern about learning whether someone has been involved in a GWA scan (see Spitoon) should decrease as the size of these studies increase. The statistical fluctuations away from the population mean frequency due to sampling relied upon by this method will decrease with large sample sizes. Although resequencing studies might once again be a cause for concern as rare variants will be diagnostic of a person's presence in a study.
Updated: link.
Sunday, August 31, 2008
Counter-intuitive results using SNP chips
hot motif
A paper just out in Nature Genetics (Myers et al) extends what we know about how local sequence determines recombination hotspot activity (hotspots are 1-2kb regions where recombination happens far more frequently than in the surrounding region).
The location of recombination hotspots along the genome can be inferred from linkage disequilibrium (LD), as LD represents the joint action of genetic drift and recombination over thousands of generations (i.e. meioses). What came as a real surprise to (I think) everyone is how good these predictions have turned out to be (e.g. McVean et al, Crawford et al., See also my previous post). The good performance of these methods, combined with the large genotyping data sets such as Perlegen and the HapMap have allowed tens of thousands of putative hotspots to be identified from LD. These methods are not perfect and there will be many false positives and negatives, but this large set of hotspots allowed researchers to identify the relatively subtle signal of a recombination promoting motif (a 7mer, Myers et al. 2006). This new paper further extends this motif to a degenerate 13mer. I view the success of these LD based methods and the discovery of the motif as one of the real success stories of empirical population genetics. Population genetic analyses have really lead the way in giving a glimpse of something unknown about the mechanism of recombination. We don't know what sequence motifs promote recombination (or even if one exists) in mouse or S.cerevisiae where far more mechanistic work has be performed (we do know about a motif in S. pombe, Steiner and Smith).
Further, the authors sow that the motif is involved in non-allelic (ectopic) recombination events. This is not entirely surprising as the a motif that promotes recombination bound to get it wrong some of the time, but they are still really nice examples.
References
Myers et al. A common sequence motif associated with recombination hot spots and genome instability in humans.
Nat Genet. 2008 Aug 24
Crawford et al. Evidence for substantial fine-scale variation in recombination rates across the human genome. Nat Genet. 2004 Jul;36(7):700-6.
McVean et al. The fine-scale structure of recombination rate variation in the human genome. Science. 2004 Apr 23;304(5670):581-4
Myers et al. A fine-scale map of recombination rates and hotspots across the human genome. Science. 2005 Oct 14;310(5746):321-4
Wednesday, August 20, 2008
A couple of articles on forensic DNA matches
A freakonomics blog piece on the FBI's DNA match probabilities (via Genome-technology). There is also some interesting correspondence (1,2,3) in Nature Reviews Genetics on the reliability and use of Low Copy Number DNA forensic profiling (i.e. using trace amounts of DNA), which briefly discusses the difficulties posed by contamination and allele drop out. When I get a minute I'll have to go and read the statistical background and recommendations on this.
Sunday, August 17, 2008
flipping inversion
A new paper, just out in Nature Genetics, by Zody and Jiang et al. looks at the evolutionary history of the ~1Mb 17q inversion. This inversion was first described by Stefansson et al.. Stefansson et al. found the normal allele H1 was present in many populations but that the inverted allele H2, seemed to have increased rapidly in frequency in Europeans, perhaps due to positive selection, in support of this they found that the inverted allele H2 was associated with a higher birth-rate in modern day individuals in Iceland. Further sequence analysis by Stefansson et al. suggested that the two alleles (H1 and H2) diverged ~2.5-3 million years ago, which is old for a human allele and surprising given the low frequency of H2 in Africa. Since then the region has been implicated in various diseases. Zody and Jiang et al. reconstruct the history of the region and surprisingly suggest that the H2 allele is actually the ancestral state, despite the fact that it is present at low frequency world-wide. They also find that the similar inversions are polymorphic in chimpanzee and orangutan, suggesting that the region is subject to recurrent inversions.
References:
Evolutionary toggling of the MAPT 17q21.31 inversion region.
Zody and Jiang et al.
Nature Genetics
A common inversion under selection in Europeans.
Stefansson et al.
Nature Genetics
Friday, August 15, 2008
More on right-handed snakes
I just spotted that the author of the right-handed snakes paper, I wrote about a while ago (see here), has videos on his website of the snakes and snails in action.
Thursday, August 7, 2008
fine-scale recombination and biased gene conversion
Two new papers mapping fine-scale recombination rates:
The first ( Mancera et al , see here for a commentary from Michael Lichten) maps recombination events in yeast tetrads using very high density genotyping. One of the most interesting points from an evolutionary perspective is that they claim to find direct evidence of biased gene conversion (repairing of G/C-A/T heterozygotes in gene conversion tracts in favour of GC). Biased gene conversion is a much discussed force that could potentially explain the patterns of base composition in genomes (e.g. in humans see Duret and Arndt, Spencer et al and Dreszer et al) and is a potential confounder of divergence based tests of selection ( Galtier and Duret ), but actually has relatively little experiment support (see Buard and De Massy for discussion). So it is nice to see it being confirmed. I do worry slightly about genotyping error as a confounder here, a genotyping error would look like a short gene conversion tract. Thus if genotyping errors were for some reason biased to call GC over AT they could result in this effect. I don't have any real sense of whether this could be a problem.
The second paper is a really nice application of sperm-typing in humans from Alec Jeffreys group (Webb et al), looking for meiotic recombination hotspots (1-2kb segments were recombination frequently happens) at places where Linkage Disequilibrium (LD) breaks down very rapidly. They confirm that all of these locations appear to be true hotspots, but that the intensity of the hotspots are not well predicted by inferences from LD (which is not too surprising). This futher confirms the utility of LD analyses in identifying hotspots of reocmbination in humans.
They find that some of the weaker hotspots are polymorphic between individuals (i.e. individuals varying in their intensity of the reocmbination in a hotspot), and that some of this polymorphism is experiencing biased gene conversion. Now this is a different type of biased gene conversion from that discussed above. This form of biased gene conversion occurs because the two different alleles simulate recombination (i.e. crossover accompanied by gene conversion) in cis at different rates (rather than a bias in the repair of gene conversion), thus the alleles are lost due to conversion at different rates. Because the chromosome that initiates recombination is the one repaired by gene conversion, an allele that stimulates recombination is cis is undertransmitted in heterozygotes. This means that alleles that promote hotspots are driven from the population, which leads to the paradox why are there hotspots (this was pointed by Rosy Redfield and colleagues, and termed the hotspot paradox she recently posted about this on her blog).