Saturday, December 15, 2007

RE why-simulations-are-important

Thank you for commenting on my post John. I appreciate you discussing your paper with me, I understand that this is a very busy time for you.

My criticism of Hawks et al, is that Wang et al (nor Hawks et al) did not do sensible
simulations, which is a concern as the false positive rate and power of the test is key to Hawks et al statements about the number of sweeps. Most previous selection-scan papers used empirical cutoffs and shied away from making strong statements about the true number of selective events. They often (though there are also plenty of papers which do not even do this) simply performed simulations to show that their tails are likely to be enriched for true positives. This is not because researchers are not interested in the rate of adaptation (or whether it has changed), or lack the 'mathematics' to understand the concept, but because it is a very tough statistical problem. One which I don't think that Wang et al (or subsequent analysis by Hawks et al) solved.

I know that Wang et al did simulations. However, I would argue against the relevance of the simulations. For example, one of the simulations permuted the SNPs on a chromosome, which does not simulate under a sensible null as it removes the correlation between neighboring SNPs (which is exactly what can generate false positives). In other simulations they generated bottle-neck data via an ad-hoc resampling procedure, but I am not convinced that this is sensible procedure as there is little to show that this is an appropriate null (the concordance of D' is not sufficient, and none of the simulations look to my (admittedly tired) eye very like the data outside of the tails which is somewhat worrying).Thus I do not know whether the cutoff proposed by Wang et al is appropriately conservative. Now nobody has shown that the cutoff is inappropriate, but previously no one was asked to place a very strong belief in the cutoff. Thus the Wang et al paper is a potentially interesting source of loci worth further investigation (as are many other scans of selection), but I've not seen any evidence to convince me that the majority (or even a reasonable fraction) of the Wang et al candidates are true sweeps. I'm not saying that I disbelieve in tests of selection, but to make high profile statements about the rate of selected mutations (and how that has changed through time) requires good evidence.

If no power simulations have been performed it is hard to judge whether the method has good power. Poor power would not automatically imply that you should assume that there are many more selective events than given by your cutoff, merely that the test only recovers a low biased subset of the true events (which might be small number). The tail of the distribution of the test will (like all tests) contain false positives and true positives, the ratio of these two quantities depends on the false positive rate,
the true number of selective sweeps and the power of the test to detect these sweeps. The absence of good simulations means that people are unable to judge whether the 0.5% tail of the test contains mostly false positives or mostly true positives (or somewhere in between). The statement that the tail (thousands of putative sweeps)
contains mostly true positives is obviously only true if there are are thousands
of sweeps. If on the other hand, there are only a few detectable sweeps the tail will contain mostly false positives. Now you might argue that there are thousands of sweeps (and I might be inclined to agree with you), but in the absence of good
simulations/evidence this is just an assertion.

If you want to show that there are an absence of older sweeps in the data, a natural null to simulate from, would be one with a constant influx of selected mutations. The method of Wang et al (or more likely a combination of methods, to improve power across the frequency range) could then be applied to the data, to see whether a very different pattern is found from that observed in the data. If it was (for a wide range of demographic parameters) then this would offer a reasonable demonstration of the hypothesis.

Now on population genetic theory grounds I'm inclined to believe the
assertion that the rate of evolution might well have changed throughout human history. But I'm not convinced that Hawks et al shows that human evolution has sped up, though I'm quite willing to believe that the assertion might be proved correct.

No comments: