Sunday, April 6, 2008

Common variants, when do we stop looking?

Just a few thoughts on genome-wide association studies, prompted by Genetic Future's recent posts on the low returns of some genome-scans (here and the here). Now meta-analysis of combined studies will get us a long way towards getting small effect alleles without the expense of typing additional cases (and we've seen quite a bit of this already), as will methods for studying epistatic interactions. So people will definitely squeeze the current data sets more. But my question/thought is: when do we stop looking for common variants for a particular common disease by increasing our sample size? Now this is a silly question, because the answer is mainly determined by practical constraints like funding and the ease of phenotyping cases. Also I suspect the answer is that we'll keep on doing genome-wide association studies until resequencing is cheap enough to become a common tool, and then rare variants will be popular. But theoretically when should we stop, as thinking about this might help us weigh the merits of different studies/study designs?

I think the answer depends strongly on the reasons for doing genome-wide association studies in the first place. I think there are two main reasons: predicting disease risk and understanding the pathways involved in the disease (though obviously these are not distinct aims).

If you are interested in predicting the disease risk from a person's genotype, you need to think about 'if I increase our sample size dramatically will I get much better at predicting disease risk'? The answer to that perhaps will be no, most of the common variants known are not very predictive so the ones that will be discovered next will be even less helpful in predicting risk. Now there will be a vast number of tiny effect loci, but it seems to me that we rapidly hit diminishing returns for predicting risk.

If on the other hand you want to learn about the pathways involved in the disease (for drug targeting...etc), then perhaps the size of the effect is not important, just finding a new region will be informative about some part of the pathway (if you can understand what the region is telling you). One perhaps serious wrinkle on this is: 'are tiny effect loci really informative about the pathways'. The effect size may be small because the effect of the allele is very remote in the network from the main pathways, in which case it might be very hard to work your way back to understanding something new about the disease.

Now an additional benefit of finding a tiny effect common allele, is that the region containing the region might be the target of large effect rare variants. Might one view be that by discovering tiny-effect common allele that people are preparing the ground (i.e. finding candidate loci) for resequencing studies of rare variants. Obviously the resequencing will be done genome-wide, but researchers will up-weight interesting rare variants in loci where weak effect loci are already known. If so, this will be a funny twist because genes involved in mendelian diseases (rare-strong effect mutations) were originally seen as candidate locations for common disease variation.

I think the exciting thing is we really don't know what we will learn, nor when to stop. The great thing about the WTCCC (and other major efforts) is that by concentrating a lot of effort on a few diseases, we might quickly learn what does and doesn't work.


Daniel said...

Great post, G - I particularly like the point about using genes identified in GWAS to prioritise variants in sequencing-based studies. Figuring out which variants are functional is going to be a real challenge in the sequencing era, so any extra sources of information will be useful.

I'd add one further motivation for continuing the GWAS effort, even in the face of diminishing returns: it provides a short-term incentive for the collection of the massive sets of DNA samples and phenotype information that will be needed to make sense from sequencing data in a few years' time.

G said...


Although, using common variation to prioritise rare variants is perhaps less helpful if the aim is to characterise the pathway. Though perhaps large effect mutations are easier to functionally analyse.

I agree that GWAS are accumulating great resources for resequencing. However, I imagine that legally getting a patient's consent for a GWAS is easier than for a resequencing study. But I'm sure where possible researchers are getting permission for both.