Wednesday, December 10, 2008

Natural variation in Nature

Nature has a whole slew of reviews on the current progress and prospects of mapping the genetic determinants of phenotypic variation:
Association mapping in humans (here)
Mapping behavioral traits in mouse (here)
Genetic variation in malaria (here)
Mapping in plants (here)
and how we should use natural variation to learn about biological systems (here)
quite a fun reading list.

Saturday, December 6, 2008

Using admixed populations to separate cis and trans effects

A new article by Price et al. looks at the effects of cis and trans acting variation on gene expression. A number of studies have approached the genetics of gene expression in humans by doing GWA mapping of the genetic determinants of the expression of a gene (usually many genes measured on a microarray). However, this approach is strongly biased towards finding signals in cis. The cis region around a gene represents a small fraction of the SNPs in the genome, and so testing for cis effects suffers a much smaller multiple testing correction than trans variation, which must be significant beyond a multiple testing threshold for the entire genome.

Price et al. cleverly circumvent this by looking in the recently admixed African American population. African Americans have on average 20% European ancestry and 80% African ancestry. Because this admixture is recent, there are only a few generations of recombination and so the genome of an African American can be thought of as a mosaic made up of big blocks of alternating African and European ancestry. So at any location in the genome African Americans differ in whether they have locally inherited 0, 1 or 2 chromosomes from African ancestors. African Americans also vary in their genome-wide admixture proportion. Price et al. use this fact, to look at trans effects ancestry, by looking at the correlation between genome-wide ancestry proportion and the expression level of genes. They contrast this to the effect of cis ancestry (i.e. 0,1, or 2 African alleles at a site), to obtain an estimate of the variance explained by cis and trans effects. Somewhat surprisingly (at least if you read mostly human genetics papers) they find that only 12% of heritable variation in gene expression level is explained by cis effects. This kind of result has also been seen in Drosophila where trans effects make up the bulk of within species variation, but contribute less to between species differences (see evolgens post on this topic: Slightly Deleterious in Trans).

See also Gene expression

Wednesday, September 17, 2008

A NYT Q&A session with Amy Harmon

Just a quick link to a NYT Q&A session with Amy Harmon, who got the Pulitzer for her DNA age series. Her recent article on challenges of teaching evolution in the States has generated a lot of the questions focus on the media and creationism.

Sunday, August 31, 2008

Counter-intuitive results using SNP chips

Two recent papers use of high density SNP chips to show counter intuitively that you can locate the geographic origin of a person to within a few hundred miles using only genetic data (Novembre et al., hat tip to gnxp), and that you can detect whether a person has contributed to the pooled data of a genome-wide association (GWA) scan (homer et al, hat tip to gnxp and the The Spitoon). Both of these papers testify to the power of large amounts of data for detecting very subtle signals. I think we are currently not use to how tiny effects can be multiplied across the hundreds of thousands of SNPs in these studies. It will be interesting to see what other counter intuitive results emerge, especially as we move towards whole genome resequencing. It is interesting to note that the current concern about learning whether someone has been involved in a GWA scan (see Spitoon) should decrease as the size of these studies increase. The statistical fluctuations away from the population mean frequency due to sampling relied upon by this method will decrease with large sample sizes. Although resequencing studies might once again be a cause for concern as rare variants will be diagnostic of a person's presence in a study.

Updated: link.

hot motif

A paper just out in Nature Genetics (Myers et al) extends what we know about how local sequence determines recombination hotspot activity (hotspots are 1-2kb regions where recombination happens far more frequently than in the surrounding region).
The location of recombination hotspots along the genome can be inferred from linkage disequilibrium (LD), as LD represents the joint action of genetic drift and recombination over thousands of generations (i.e. meioses). What came as a real surprise to (I think) everyone is how good these predictions have turned out to be (e.g. McVean et al, Crawford et al., See also my previous post). The good performance of these methods, combined with the large genotyping data sets such as Perlegen and the HapMap have allowed tens of thousands of putative hotspots to be identified from LD. These methods are not perfect and there will be many false positives and negatives, but this large set of hotspots allowed researchers to identify the relatively subtle signal of a recombination promoting motif (a 7mer, Myers et al. 2006). This new paper further extends this motif to a degenerate 13mer. I view the success of these LD based methods and the discovery of the motif as one of the real success stories of empirical population genetics. Population genetic analyses have really lead the way in giving a glimpse of something unknown about the mechanism of recombination. We don't know what sequence motifs promote recombination (or even if one exists) in mouse or S.cerevisiae where far more mechanistic work has be performed (we do know about a motif in S. pombe, Steiner and Smith).

Further, the authors sow that the motif is involved in non-allelic (ectopic) recombination events. This is not entirely surprising as the a motif that promotes recombination bound to get it wrong some of the time, but they are still really nice examples.

Myers et al. A common sequence motif associated with recombination hot spots and genome instability in humans.
Nat Genet. 2008 Aug 24

Crawford et al. Evidence for substantial fine-scale variation in recombination rates across the human genome. Nat Genet. 2004 Jul;36(7):700-6.

McVean et al. The fine-scale structure of recombination rate variation in the human genome. Science. 2004 Apr 23;304(5670):581-4

Myers et al. A fine-scale map of recombination rates and hotspots across the human genome. Science. 2005 Oct 14;310(5746):321-4

Wednesday, August 20, 2008

A couple of articles on forensic DNA matches

A freakonomics blog piece on the FBI's DNA match probabilities (via Genome-technology). There is also some interesting correspondence (1,2,3) in Nature Reviews Genetics on the reliability and use of Low Copy Number DNA forensic profiling (i.e. using trace amounts of DNA), which briefly discusses the difficulties posed by contamination and allele drop out. When I get a minute I'll have to go and read the statistical background and recommendations on this.

Sunday, August 17, 2008

flipping inversion

A new paper, just out in Nature Genetics, by Zody and Jiang et al. looks at the evolutionary history of the ~1Mb 17q inversion. This inversion was first described by Stefansson et al.. Stefansson et al. found the normal allele H1 was present in many populations but that the inverted allele H2, seemed to have increased rapidly in frequency in Europeans, perhaps due to positive selection, in support of this they found that the inverted allele H2 was associated with a higher birth-rate in modern day individuals in Iceland. Further sequence analysis by Stefansson et al. suggested that the two alleles (H1 and H2) diverged ~2.5-3 million years ago, which is old for a human allele and surprising given the low frequency of H2 in Africa. Since then the region has been implicated in various diseases. Zody and Jiang et al. reconstruct the history of the region and surprisingly suggest that the H2 allele is actually the ancestral state, despite the fact that it is present at low frequency world-wide. They also find that the similar inversions are polymorphic in chimpanzee and orangutan, suggesting that the region is subject to recurrent inversions.

Evolutionary toggling of the MAPT 17q21.31 inversion region.
Zody and Jiang et al.
Nature Genetics

A common inversion under selection in Europeans.
Stefansson et al.
Nature Genetics

Friday, August 15, 2008

More on right-handed snakes

I just spotted that the author of the right-handed snakes paper, I wrote about a while ago (see here), has videos on his website of the snakes and snails in action.

Thursday, August 7, 2008

fine-scale recombination and biased gene conversion

Two new papers mapping fine-scale recombination rates:

The first ( Mancera et al , see here for a commentary from Michael Lichten) maps recombination events in yeast tetrads using very high density genotyping. One of the most interesting points from an evolutionary perspective is that they claim to find direct evidence of biased gene conversion (repairing of G/C-A/T heterozygotes in gene conversion tracts in favour of GC). Biased gene conversion is a much discussed force that could potentially explain the patterns of base composition in genomes (e.g. in humans see Duret and Arndt, Spencer et al and Dreszer et al) and is a potential confounder of divergence based tests of selection ( Galtier and Duret ), but actually has relatively little experiment support (see Buard and De Massy for discussion). So it is nice to see it being confirmed. I do worry slightly about genotyping error as a confounder here, a genotyping error would look like a short gene conversion tract. Thus if genotyping errors were for some reason biased to call GC over AT they could result in this effect. I don't have any real sense of whether this could be a problem.

The second paper is a really nice application of sperm-typing in humans from Alec Jeffreys group (Webb et al), looking for meiotic recombination hotspots (1-2kb segments were recombination frequently happens) at places where Linkage Disequilibrium (LD) breaks down very rapidly. They confirm that all of these locations appear to be true hotspots, but that the intensity of the hotspots are not well predicted by inferences from LD (which is not too surprising). This futher confirms the utility of LD analyses in identifying hotspots of reocmbination in humans.
They find that some of the weaker hotspots are polymorphic between individuals (i.e. individuals varying in their intensity of the reocmbination in a hotspot), and that some of this polymorphism is experiencing biased gene conversion. Now this is a different type of biased gene conversion from that discussed above. This form of biased gene conversion occurs because the two different alleles simulate recombination (i.e. crossover accompanied by gene conversion) in cis at different rates (rather than a bias in the repair of gene conversion), thus the alleles are lost due to conversion at different rates. Because the chromosome that initiates recombination is the one repaired by gene conversion, an allele that stimulates recombination is cis is undertransmitted in heterozygotes. This means that alleles that promote hotspots are driven from the population, which leads to the paradox why are there hotspots (this was pointed by Rosy Redfield and colleagues, and termed the hotspot paradox she recently posted about this on her blog).

Tuesday, July 29, 2008

Lightweight males

Males leave more than sperm when they copulate with a female, they also leave a bunch of seminal proteins that are involved in sperm competition and sexual conflict. The problem in studying this system is that it is hard to distinguish between those proteins already present in the female and those deposited by the male during copulation. An new paper (Findlay et al.) uses a clever technique to get around this, they raise the Drosophila females on a heavy nitrogen food which means that they can distinguish the male and female proteins using mass spec. It seems like a great technique and will be applicable to many situations where people need to work out which individual produced which protein. There is also a news piece on the article here

Proteomics Reveals Novel Drosophila Seminal Fluid Proteins Transferred at Mating.
Geoffrey D. Findlay, Xianhua Yi, Michael J. MacCoss, Willie J. Swanson. 2008. PLoS Biology

Sunday, July 27, 2008

Happy birthday to PLoS Genetics

Happy birthday to PLoS Genetics. There's an article giving various facts and thanks here . PLoS Genetics has quickly become a great journal for evolutionary genetics and genetics in general. I think it ranks very favorably along with journals like Genetics, Genome research and Nature Genetics (though it carries fewer of the big/successful association studies than the latter).

Sunday, July 6, 2008

Why do range expansions run out of steam?

I've not posted for a while as I've been away at conferences and I'm been trying to catch up with the work I've missed. I'll try to keep my posts more regular and I hope I've not dropped off people's lists (if I was on them to begin with).

What stops a species from extending its range indefinitely, why does a species not adapt and spread to every environment it encounters? This question has strong evolutionary and ecological importance, and it is not just an academic question as shown by the changing range of mosquitoes due to climate change.

A number of factors are likely to play a role in stopping the spread of a species (and I strongly suspect that I've missed a few). One of the major reasons is that the populations at the species margin can be prevented from adapting to new environments because gene flow into the low density species margin from the rest of the species range can swamp/hinder adaptation to a new environment (see Kirkpatrick and Barton). Likewise, if the species is excluded from new a new range by a different organism filling the ecological niche that the species would occupy, gene flow with each species can stop the two species from co-occurring by preventing character displacement (see Case and Taper).

Finally, populations at the margins of a species range/expansion can have low levels of genetic diversity, meaning that they lack standing genetic diversity which would allow them to adapt to new environments. This last point has received support from observations in many species that 'neutral' (i.e. at random loci) genetic diversity is lower in the margins of a species range than in the centre, but few of these studies have shown that this compromises the ability of the populations at the species margin to adapt (Eckert et al.).

A new study by Pujol and Pannel starts to correct this shortcoming by showing that in a species of plant, populations at the edge of a species range have (compared to plants from the main species range) lower neutral diversity, lower variability in a potentially important adaptive trait, and that they have a slower respond to artificial selection on this trait. Obviously, this result is for just one plant species so more work is needed to show that this is a general effect, but it certainly indicates that low diversity at the edge of a species range can be important in slow the rate of spread of a species.

I wonder if the lack of genetic variation in the marginal populations could be self re-enforcing (this may have been pointed out before). The reduced response to selection could leave these marginal populations more vulnerable to demographic fluctuations due to environmental change. If so these marginal populations might frequently undergo bottlenecks further reducing their diversity. Such cycles might keep the species' diversity continuously low at species margins stopping the spread of a species into new environments.

Friday, April 25, 2008

Would a gene by any other name smell as sweet?

I was thinking about blogging about the paper on metal tolerance evolution in Arabidopsis halleri via a cis-regulatory change (Hanikenne et al), but I see that gnxp has already done so here. So I thought in the vein of the 'cis-regulatory vs protein' evolution debate, I would point people towards a recent paper (Scalliet et al and here for a commentary) looking at a phenotypic change involving a coding change in roses. The plant in question is the Chinese rose, which apparently is where the garden variety hybrid tea rose (chinese x european) gets its scent from. The final part of the pathway which underlies the rose's scent involves two genes (OOMT1 and OOMT2), and the authors identify a single residual underlying the crucial difference in the specificity of the two proteins. OOMT1 appears to have arisen by a gene duplication in Chinese roses, as it is absent in other roses. So here is a case of recent gene duplication followed by protein divergence underlying a novel phenotype.

Interesting the Arabidopsis halleri paper identifies both a change in copy number and cis regulatory mutation underlying a phenotype. Changes in copy number can be considered regulatory mutations (as they can change the expression level of a protein), and can be selected for because of this; a recent example of this is the amalyse copy number variation (Perry et al, see the commentary on this paper by Coyne and Hoekstra). Subsequent selection pressures may favour the functionality of the duplicates to change, by amino-acid substitutions, or changes in where the two duplicates are expressed. Thus it is very likely that evolution proceeds by a combination of regulatory (cis and copy number) and protein changes (including mutations in trans factors etc) .

Evolution of metal hyperaccumulation required cis-regulatory changes and triplication of HMA4
Hanikenne et al. Nature 2008

Evolution of Protein Expression: New Genes for a New Diet
Jerry A. Coyne, Hopi E. Hoekstra. Current Biology 2007

Diet and the evolution of human amylase gene copy number variation
Perry et al 2007. Nature Genetics

Plant biology: Scent of a rose
Shadan S. Nature (News and Views) 2008

Scent evolution in Chinese roses
Scalliet et al. PNAS 2008

Monday, April 21, 2008

Sines of expansion

Cavalli-Sforza and colleagues used Principal Component Analysis (PCA) to summarize general spatial patterns of human allele frequencies across continents into maps. They interpreted peaks in these PCA maps to indicate sources of colonization, e.g. Neolithic expansions etc. However, a new paper by Novembre and Stephens (Novembre and Stephens 2008) questions this interpretation. They show that many forms of spatial covariance of allele frequencies between nearby populations can generate characteristic peaks in PCA maps. For example, simple isolation by distance or stepping stone models also give rise to peaks in the PCA maps, despite having homogeneous migration. These peaks arise because PCA analysis of data with covariance that decreases with spatial distance lead to PCA components that are sines and cosines.

The paper has some really pretty examples, I particularly enjoy the fact that these patterns also appear in Greenish warblers (a famous example of a ring species, i.e. isolation by distance). These results do not to say that human populations did not expand out of particular regions, just that PCA maps are not the best tool to judge this. The authors also note that this does not invalidate the use of PCA to correct for structure in association studies, and in fact might aid in their interpretation in epidemiological models.

Interpreting principal component analyses of spatial population genetic variation
John Novembre, Matthew Stephens. Nature Genetics
Published online: 20 April 2008 doi:10.1038/ng.139

Saturday, April 12, 2008

Mapingp cellular susceptibility to HIV

A really neat paper (Loeuillet et al.) in PLoS Biology identifying a candidate SNP for cellular susceptibility to the HIV-1 virus. The paper adding to our growing knowledge of the genetics of HIV susceptibility (see a review of the paper by David Goldstein).

Rather than investigating this in patients with the disease, the paper initially measures how susceptible different cell lines are to HIV. The paper uses the CEPH cell lines (immortalized lymphoblastoid B cells) to do an initial linkage map, and identified a broad candidate region. They followed this up using the CEU HapMap cell lines to confirm and fine map the variant. The authors then confirmed the SNP association was also present using the more biologically relevant CD4+ T-cells. Finally on the association front they showed that the SNP is associated with disease progression in patients.

The use of the HapMap and CEPH cell lines to map variants affecting cellular phenotypes is a really interesting approach. One which I'm sure we will see a lot more of in the future. At least some cellular phenotypes are likely to be easier to map as they likely have a simpler basis than complex diseases. I'm slightly surprised that the authors did not do a genome-wide association study of this cellular trait (after all they are HapMap cell lines) and instead restricted themselves to doing an association study in the region of significant linkage. Obviously a GWAS would have to meet genome-wide significance, but the region of significant linkage could have been up-weighted or considered separately in this analysis.

Hat tip to Tree of Life.

Loeuillet C, Deutsch S, Ciuffi A, Robyr D, Taffé P, Muñoz M, Beckmann JS, Antonarakis SE, Telenti A.
In vitro whole-genome analysis identifies a susceptibility locus for HIV-1.PLoS Biol. 2008 Feb;6(2):e32
Goldstein DB.
Genomics and biology come together to fight HIV
PLoS Biol. 2008 Mar 25;6(3):e76

Sunday, April 6, 2008

Common variants, when do we stop looking?

Just a few thoughts on genome-wide association studies, prompted by Genetic Future's recent posts on the low returns of some genome-scans (here and the here). Now meta-analysis of combined studies will get us a long way towards getting small effect alleles without the expense of typing additional cases (and we've seen quite a bit of this already), as will methods for studying epistatic interactions. So people will definitely squeeze the current data sets more. But my question/thought is: when do we stop looking for common variants for a particular common disease by increasing our sample size? Now this is a silly question, because the answer is mainly determined by practical constraints like funding and the ease of phenotyping cases. Also I suspect the answer is that we'll keep on doing genome-wide association studies until resequencing is cheap enough to become a common tool, and then rare variants will be popular. But theoretically when should we stop, as thinking about this might help us weigh the merits of different studies/study designs?

I think the answer depends strongly on the reasons for doing genome-wide association studies in the first place. I think there are two main reasons: predicting disease risk and understanding the pathways involved in the disease (though obviously these are not distinct aims).

If you are interested in predicting the disease risk from a person's genotype, you need to think about 'if I increase our sample size dramatically will I get much better at predicting disease risk'? The answer to that perhaps will be no, most of the common variants known are not very predictive so the ones that will be discovered next will be even less helpful in predicting risk. Now there will be a vast number of tiny effect loci, but it seems to me that we rapidly hit diminishing returns for predicting risk.

If on the other hand you want to learn about the pathways involved in the disease (for drug targeting...etc), then perhaps the size of the effect is not important, just finding a new region will be informative about some part of the pathway (if you can understand what the region is telling you). One perhaps serious wrinkle on this is: 'are tiny effect loci really informative about the pathways'. The effect size may be small because the effect of the allele is very remote in the network from the main pathways, in which case it might be very hard to work your way back to understanding something new about the disease.

Now an additional benefit of finding a tiny effect common allele, is that the region containing the region might be the target of large effect rare variants. Might one view be that by discovering tiny-effect common allele that people are preparing the ground (i.e. finding candidate loci) for resequencing studies of rare variants. Obviously the resequencing will be done genome-wide, but researchers will up-weight interesting rare variants in loci where weak effect loci are already known. If so, this will be a funny twist because genes involved in mendelian diseases (rare-strong effect mutations) were originally seen as candidate locations for common disease variation.

I think the exciting thing is we really don't know what we will learn, nor when to stop. The great thing about the WTCCC (and other major efforts) is that by concentrating a lot of effort on a few diseases, we might quickly learn what does and doesn't work.

Saturday, April 5, 2008

"Exons, Schmexons"

A summary by PZ Myers of Coyne and Wray's keynote speeches on evodevo. It sounds like it would have been fun to see, particularly the dueling t-shits (one is quoted in the title of this post). I think that Coyne is right that the only real way to know where selected changes occur and what type of mutations they are is to do very detailed follow-up work.

I thought I would give a link to a relatively recent paper by Wray's group looking for positive selection in promoters using the human, chimp and macaque sequences (Haygood et al). Their main point (at least in the coding vs noncoding debate) is that many promoters seem to undergo positive selection compared to exons (especially in interesting categories of genes). The paper looks for positive selection by looking for promoter regions that have significantly more substitutions than nearby intronic regions.

I've not read it in a while so I'll avoid commenting on the technical details. However, a lack of genes with d_N/d_S>1 is not proof that genes do not often experience positive selection, just that d_N/d_S>1 is a pretty crappy measure of positive selection. The problem is that d_N incorporates all of the selection against amino-acid changes plus any weak signal of positive selection. For a gene to meet the d_N/d_S>1 criteria it has to have had a whole bunch of amino-acid changes, if positive selection on a gene often involves just a few amino-acid changes it will not satisfy d_N/d_S>1. Promoter sequences on the other had could be made up of a mixture of near-neutrally evolving regions plus a small number of more constrained regions. A few additional substitutions in the promoter due to positive selection could easily tip the balance to make the promoter be 'rapidly evolving' (i.e. faster than the nearby intron) because the promoter's rate of substitution was not that different from the intronic rate anyway. That's not to say that the promoters found to be rapidly evolving are not interesting, just that the results should not be taken to mean that there is more positive selection on promoters than exons, as this is like comparing apples to oranges.

Haygood R, Fedrigo O, Hanson B, Yokoyama KD, Wray GA.
Promoter regions of many neural- and nutrition-related genes have experienced positive selection during human evolution.
Nat Genet. 2007

Wednesday, April 2, 2008

Ancient mtDNA: clocking up the mutations

A quick post about Hay et al, which I spotted thanks to Pondering Pikaia (Nature news also has a piece on it). I think that this paper is interesting but I think there are some issues in the interpretation.

The paper estimates the mtDNA mutation rate of tuatara (an ancient lineage of reptile living on New Zealand), and find that the mtDNA mutation rate is one of the highest estimated so far. The usual way of estimating mutation rates is to compare sequence divergence between two species (or set of species) where the divergence time of the pair of species (or nodes in the tree relating the species) are known with some precision (this technique is not without its problems, and a lot of work in molecular phylogentics is devoted to solving these problems). However, if you don't have a set of relatively close species or they a lack of fossil record then you are usually out of luck, and the mutation rate can not be estimated. To get around this the authors use patterns of mtDNA diversity in extant and ancient mtDNA to estimate the mutation rate. Usually patterns of population genetic diversity are not in themselves informative about mutation rates, because the diversity within a species is determined by the product of the mutation rate and the effective population size (i.e. the mutation rate and the effective population size are confounded). High diversity within a species could be due to a high mutation rate and a low effective population size, or a low mutation rate and a high effective population size. The inclusion of ancient DNA samples can potentially resolve this confounding; this is because the temporal spacing between the samples, gives information about the rate of coalescence (i.e. the effective population size). This information in turn provides a way of disentangling the effective population size from the mutation rate (see Drummond et al.). The authors use this idea to estimate the mtDNA mutation rate, and have also used this idea before for Penguin ancient mtDNA ( Lambert et al ).

However, as with all population genetic analyzes things are not quite that simple. Violations of the simple coalescent model used to estimate the population size and hence the mutation rate could be problematic. Population size changes (such as a bottleneck) or selection could alter the rate of rate of coalescence through time, and so potentially confuse the estimation of the effective population size and so the mutation rate. Population structure is another potential problem, if the present day samples and ancient DNA samples are not all drawn from the same panmictic population this would also violate the model assumptions and so potentially impact the estimate of the mutation rate. Now the authors sample both extant and ancient specimens from around New Zealand, so population structure may not be a problem though this can not be assumed to be the case. The authors also look at whether exponential population growth changes their results and find that it does not, but it is not clear that other plausible demographic models could not cause biases.

Estimates of mutation rate from this kind of analysis are clearly not problem free, and so may need be treated with some caution. If you are somewhat skeptical of the estimation population demography from mtDNA sequences (and you should be) then you should be somewhat skeptical of estimating mutation rates via this technique. I think that the idea behind this study and the results are really neat, but for the moment I would regard this study as supporting/suggesting a hypothesis that the mutation rate is high rather than definitely showing that this is the case.


Hay JM, Subramanian S, Millar CD, Mohandesan E, Lambert DM.Rapid molecular evolution in a living fossil.Trends Genet. 2008 Mar;24(3):106-9

Drummond AJ, Nicholls GK, Rodrigo AG, Solomon W. Estimating mutation parameters, population history and genealogy simultaneously from temporally spaced sequence data.
Genetics. 2002 Jul;161(3):1307-20.

Lambert DM, Ritchie PA, Millar CD, Holland B, Drummond AJ, Baroni C. Rates of evolution in ancient DNA from Adélie penguins.
Science. 2002 Mar 22;295(5563):2270-3.

Wednesday, March 26, 2008

The limits of unlinked SNPs for learning about demography

The best way to learn about demography from population genetic data is to look at multiple unlinked regions (a common theme over at the evolgen blog). The distribution of frequencies in a populations of neutral alleles at SNPs (the site frequency spectrum), is informative about population history. For example an excess of low frequency mutations is consistent with recent population growth, as the increase in population size introduces new mutations but these mutations have not yet had time to drift to higher frequencies.

A number of papers have made use of the frequency spectrum of unlinked SNPs to learn about demography. A technical but elegant article by Myers et al shows that while informative about demography, the site frequency spectrum at unlinked SNPs can not help you chose between certain demographic histories. This is not a question of imperfect knowledge of the site frequency spectrum (which more data would solve) but because for any particular demographic model, as Myers et al formally show, there are a large family of demographic histories that can give rise to the same site frequency spectrum. They explained: 'Informally, changes in population size at some past time are canceled out by other changes in the opposite direction'. I think that this lack of information comes from the fact that each unlinked SNP only tells you about the placement of a single mutation on the genealogy of the population at that site, and over sites you learn about the expected amount of time in different parts of the genealogy (loosely worded, as there are in fact a set of genealogies). It is therefore perhaps not surprising in hindsight that these data are not sufficient to learn everything about population size changes.

This problem can be circumvented by using patterns of linkage disequilibrium within genomic regions to add additional information about the patterns of coalescent trees across the genome. I've often wondered along similar lines about how we can learn about population histories from population genetic data (even a whole genome's worth) and what the fundamental limits might be.

Myers S, Fefferman C, Patterson N.
Can one learn history from the allelic spectrum?
Theor Popul Biol. 2008 Jan 3

Sunday, March 23, 2008

Mutating males

A short but nice article by Doris Bachtrog, looking at whether there is a faster mutation rate in males compared to females in Drosophila. Studies of a number of vertebrates have shown a faster rate of substitution on the Y compared to the X at putatively neutral sites, thus suggesting a faster male mutation rate (as under neutrality the substitution rate is the mutation rate). This high rate of mutation in males is thought to be due to the larger number of cell divisions in the germ line of males compared females. Previous work has not seen this effect in Drosophila. Bachtrog used a set of orthologous genes on the recently formed neo-sex chromosomes of D. miranda (I discussed some of her work on this system in a previous post). The recent sex linkage of these chromosomes means that the genes on them have altered the proportion of generations they spend in the male and female germ line, making them an excellent system for looking at male and female mutation rates. To look at putatively neutral changes, Bachtrog looks at changes in short introns and at synonymous changes in genes (where the changes are classified by their direction with respect to codon bias). She finds a consistently lower rate of substitution at sites on the neo-X chromosome compared to the neo-Y, suggesting that in D miranda there is a higher rate of mutation in males.

I guess I'm still still slightly concerned that the higher rate of changes on the neo-Y could reflect relaxed constraint, but as the effect is seen across a range of different putatively neutral sites, a higher mutation rate seems the parsimonious conclusion. It would be interesting to know whether this higher male mutation rate is true of the entire Drosophila clade, or if is restricted to certain species due to differences in number of cell divisions. I wonder whether whole genome resequencing (or sequencing of a reduced representation) could be used on mutation accumulation lines to look into this. One issue with this experimental approach would be that generation time in the lab might differ from that in the wild, but it could offer a complementary way to look at this problem.

Bachtrog D. Evidence for male-driven evolution in Drosophila.
Mol Biol Evol. 2008 Apr;25(4):617-9

Friday, March 21, 2008

parasite induced mimicry in ants

Just a quick note to point people towards a paper (Yanoviak et al) that gives an amazing example of a parasite inducing mimicry in their host. The nematode parasite is transmitted to ant larvae via bird droppings, which the ant larvae are fed on. The parasite makes the ant's abdomen bright berry red, and also makes the ant hold its abdomen up in the air. This attracts birds to eat the berry-like parasite infected ants, completing the cycle.

There's a more extensive discussion of the article at This week in Evolution

Parasite-Induced Fruit Mimicry in a Tropical Canopy Ant
Yanoviak,Kaspari,Dudley, Poinar. Am Nat 2008. Vol. 171, pp. 536–544

Wednesday, March 19, 2008

Tan is the new black

There's a really interesting paper by Sean Carroll's group in Cell ( Jeong et al. ) on pigmentation differences between two closely related species of Drosophila (D. santomea and D yakuba). D santomea has a small range and is restricted high altitudes on the island of São Tomé, while Yakuba is more widespread and lives at lower altitudes on São Tomé. The species can hybridize, and form a natural hybrid zone, in fact the mtDNA has introgressed between species (Bachtrog et al. Llopart A et al). D. santomea unlike the rest of the Melanogaster clade lacks abdominal pigmentation. Carroll's group looks at one of the genes involved in this change in detail.

A couple of previous QTLs studies looking at abdominal pigmentation differences between these two species had identified a QTL close to a candidate gene tan on the X chromosome. While another small QTL maps near the Yellow gene also on the X chromosome. The authors show that Tan in combination with the Yellow gene produces the abdominal pigmentation in D. melanogaster. No coding differences are found between the tan gene of D santomea and D yakuba, suggesting that changes in the regulation are prime candidates for the difference in pigmentation. They then show that Yellow and Tan expression is present in the abdominal region of D Yakuba but absent in D Santomea. Replacing the X chromosome of D yakuba with that of D santomea removes the Tan expression pattern but not the yellow pattern, suggesting that the difference in Tan expression is controlled in cis (and yellow is controlled in trans). The authors identify a cis regulatory module for tan that controls the abdominal expression in D. melanogaster. They then show that this D. melanogater cis regulatory module can create the pigmentation pattern in male D. Santomea. A couple of changes have occured in the D. Santomea sequence in the regulatory module at otherwise conserved sites. The authors show that these sites are responsible for the reduced abdominal expression of tan.

At this point the authors decided to look at polymorphism and divergence data in D. santomea in this module. This is when the story gets really interesting. The authors I guess hoped to find the signal of a sweep in D. santomea around this region, and that all individuals would be fixed for the mutations that inactivated the cis module. What they found however was that the changes were not fixed in the population, but that there appear to be three distinct inactivation mutations at the cis module. They confirmed that the two newly discovered mutations (both deletions) removed the abdominal expression of tan, and so are likely to remove the pigmentation as well. Thus D. santomea has three different mutations at the same locus resulting in the same phenotype, which is pretty incredible case of parallel mutation. The authors argue that this is likely the result of selection rather than simply neutrality following relaxed constraint, and I find that pretty convincing. There is no observed polymorphism for pigmentation in D. santomea suggesting that the combined effect of these three mutations combined have removed pigmentation. It seems unlikely that this small mutational target (the module) experienced three neutral mutations that have essentially removed pigmentation.

I wonder if one of the mating choice QTLs (between D Santomea and D Yakuba) maps to the same location as the QTL that lead to the indentification of tan (I've not checked the co-ordinates of the QTLs in Moehring et al ). Interestingly, the yellow gene (the other pigmentation gene with reduced expression) seems to show a signal of introgression between D Santomea and D Yakuba (Llopart et al. ), I wonder if the
introgression at yellow gene has prevented the accumulation of strong cis mutations at yellow, meaning that its expression had to be reduced by a trans effect.

My only minor quibble with this otherwise great paper was the stridency about cis regulatory evolution. This paper is another really pretty example of cis regulatory evolution, but to my mind it in no way seals the debate about protein versus cis regulatory evolution ( Hoekstra and Coyne ). This case is another loss of function mutation, I would like to see more gain of function mutations in cis before making up my mind that cis regulatory evolution predominants. I also feel that the follow up of these cases is somewhat biased towards following up cis regulatory changes. The authors do not follow up the reduction in yellow expression which operates in trans, thus the paper has ignored a trans effect (which admittedly may be a cis effect at another upstream gene). Also the authors do not seem to show that the cis regulatory module of tan (a pleiotropic gene) is itself free of pleiotropic effects (a prerequisite for freeing up cis regulatory evolution).

Jeong S, Rebeiz M, Andolfatto P, Werner T, True J, Carroll SB. The evolution of gene regulation underlies a morphological difference between two Drosophila sister species. Cell. 2008 Mar 7;132(5):783-93.

Moehring AJ, Llopart A, Elwyn S, Coyne JA, Mackay TF. The genetic basis of prezygotic reproductive isolation between Drosophila santomea and D. yakuba due to mating preference.Genetics. 2006 May;173(1):215-23.

Llopart A, Lachaise D, Coyne JA.Multilocus analysis of introgression between two sympatric sister species of Drosophila: Drosophila yakuba and D. santomea. Genetics. 2005 Sep;171(1):197-210.

Bachtrog D, Thornton K, Clark A, Andolfatto P.
Extensive introgression of mitochondrial DNA relative to nuclear genes in the Drosophila yakuba species group.
Evolution Int J Org Evolution. 2006 Feb;60(2)

Friday, March 7, 2008

The descent of Y

One of the most obvious features of our genome, is the large difference between in the X and Y chromosomes. The Y chromosome is small compared to its partner the X, much of its DNA content is made up of repetitive DNA and it codes for few genes.

Much of the degeneration of the Y is thought to due to the lack of recombination on the Y chromosome. Recombination is thought to be initially suppressed between the homologous chromosomes around the sex determining locus, and then extend as sex-specific coadapted complexes of genes linked to the sex determining locus build up (which recombination would destroy). The lack of recombination means that the fate of mutations that occur on the Y chromosome are coupled: A beneficial mutation drags along deleterious mutations that occur on its background as it sweeps to fixation (hitchhiking). Deleterious variation also builds up as in the absence of recombination haplotypes lacking deleterious mutations can not be recreated once they are lost by genetic drift (Mullers rachet).

However, studying the formation of the human Y chromosome is difficult. The event happened millions of years ago, and while we can learn something about the event by looking over different mammalian species we are still very limited in what we can say. A different approach to learn about sex chromosomes is to look not at old sex chromosomes (where much of the action occurred long ago), but to study new sex chromosomes or new additions to sex chromosomes. This is one of the wonderful things about biology, no matter how strange the event, it is often occurring in multiple species independently as we speak. Thus if you want to learn about something that happened in the history of one species, looking for it happening currently in another is a great tactic. As sequencing and other resources become cheaper, this will make this approach in genetics even easier.

Young sex chromosomes have been identified in a number of species ( see here ), these are associated with locally suppressed recombination (around the sex determination locus), causing these regions to start to degenerate (this paper is also a good introduction to the evolutionary dynamics of sex chromosomes).

Another example of the degeneration of the Y chromosome due to lack of recombination, are fusions between autosomes and existing sex chromosomes. These fusions happen relatively often (fusions are a relatively common chromosomal abnormality in humans) and sometimes they survive and are fixed in the population. A number of fusions between sex chromosomes and autosomes are known in Drosophila. These vary in age, and so provide a good system for studying this problem. When an autosome becomes fused to the Y chromosome (a neo-Y) it ceased to recombine, as it is transmitted through males (which have no recombination in Drosophila ). The neo-X, which is the homolog of the neo-Y continues to experience recombination. A recent paper studies the decay of the neo-Y of Drosophia Miranda, this fusion is believed to have formed about a million years ago. The authors sequenced ~2.5Mb of neo-sex chromosome. They find that half the genes on the neo-Y are psuedogenized, due to the build up of deleterious mutations. While there counterparts on the neo-X are much more conserved. The Y chromosome is also rapidly accumulating transposable elements, around 20% of the neo-Y is transposable elements comapred to ~1% on the neo-X. Thus the degeneration of the neo-Y is incredibly rapid. This suggests that the homologous genes on the X chromosome will also have to evolve rapidly to cope with the loss of their partner. For example, if the neo-Y copy of the gene is non-functional, the copy on the neo-X will have to be up-regulated to compensate for this loss. Thus dosage compensation will have to evolve quickly on the neo-X. It will be really great to learn more about the evolution of sex chromosomes from these young sex chromosome systems, as more of them get sequenced.


Genomic degradation of a young Y chromosome in Drosophila miranda.
Bachtrog D, Hom E, Wong KM, Maside X, de Jong P. Genome Biology 2008

Steps in the evolution of heteromorphic sex chromosomes.
Charlesworth D, Charlesworth B, Marais G. Heredity. 2005

Sunday, March 2, 2008

right-handed snakes

A paper that was published last year, that gives a wonderful example co-evolution. I'm not sure if it got much coverage (but may well be mistaken), but it really is great.

Snails often have clockwise shells (which in itself, is a wonderful example of the evolution of asymmetry). The authors show that snakes that prey on these snails have evolved to have more teeth on their right mandible than there left mandible. The authors also show that these snakes have a lot easier time eating snails with clockwise shells, than those with anti-clockwise shells. I remember seeing a video of this at a conference I was attending, and thinking at the time that it was the coolest thing ever.

The mutation in snails, which causes them to flip the spiral of their shells is thought to be one of the only clear case of speciation caused by a single locus, as left-handed snails have a lot of trouble mating with right-handed snails (see here). But why do left-handed snails arise and spread if they can not breed with the right-handed snails? Perhaps predation by right-handed predators offers a mechanism that would favor the left-handed snail species.

Masaki Hoso, Takahiro Asami, Michio Hori. Right-handed snakes: convergent evolution of asymmetry for functional specialization. Biology Letters. 2007
Davison A, Chiba S, Barton NH, Clarke B. Speciation and gene flow between snails of opposite chirality. PLoS Biology 2005.

Thursday, February 14, 2008

Don't go marrying your cousin quite yet

I just read the Decode paper on reproductive success and relatedness (here. You can read other comments on it at Gene Expression here and here , and over at John Hawks. Overall I think it is a really interesting finding. One of the impressive things is that the number of grandchildren that a couple produced is still a decreasing function of relatedness out to 5 and 6 cousins. As the authors note this is pretty convincing evidence that this relationship is not due to a conscious choice by couples: (as ' Relationships at this genealogical distance are rarely known to the couples or their families and acquaintances in their social environment'), and so they conclude that this phenomenon is probably biological.

However I wonder whether there could be a non-conscious non-biological forces that could potentially give rise to this phenomenon. For example, in small communities there are fewer people to choose a partner from, so you are more likely to chose a relative (just by accident). In larger towns you have more choice and so are likely to pick some one less related to you (just by chance). If the number of children born per family is higher in smaller communities then partners with close relatedness will have more children will have more children purely because there are more of them in small communities. Does this make sense, or have I missed something? The authors adjust for spatial component in their model by putting the location of the couple (one of 21 counties) in as a fixed effect in the model. But this level of spatial structure might be crude compared to the information needed to detect an effect of town size. It is also a shame that they don't have marriage dates for the couples, as it would be interesting to know if the relationship between family size and relatedness is due to marrying earlier or a higher birthrate.

If this phenomenon has a biological basis that would be really interesting, and as noted elsewhere it could be due to subtly higher attractiveness between closer relatives (obviously not too close) or a high birthrate due to reduced genetic incompatibilities between mother and child. Even if it is not biological it is an interesting observation. It would also be great to see it replicated in different human populations. I wonder if this work could be replicated in macaque or some other model organism. I look forward to seeing more papers looking at this.

Thursday, February 7, 2008

KITLG and regulatory evolution

I thought I would post a few notes on the KITLG stickleback paper. Sticklebacks are a great system because many populations of marine stickleback have become isolated in flesh water lakes. A number of groups have done a lot of great work with them, and the availability of a good genetic map and draft genome is really accelerating them into a model system.

The paper is really nice and maps a major effect allele underlie losses in (gill and ventral) pigment in a fresh water population to the KITLG gene. This gene is known to effect skin pigmentation (along with spatial learning) in mice. The really great thing about the paper is they also find that the gene affects skin colour in humans. The authors use admixture mapping in African Americans to show that the gene has a sizable affect on the difference in skin pigmentation between Africans and Europeans (and presumably East Asians). The gene has a strong signal of a sweep in East Asians and Europeans. This is another great example of how a small set of genes seem time and time again to be the target of selection for changes in pigmentation.

I have a couple of comments on the paper, a number of these are acknowledged by the authors but i thought I would highlight them here.

The KITLG gene is expressed at different levels in the ocean and fresh water populations. This difference in KITLG expression level is controlled in cis (as allele specific the rt-PCR in F1 hybrids between the ocean and freshwater population reveals that the expression of the allele corresponding to the ocean population is lower than the allele corresponding to the fresh water population).

The authors suggest that the change at the KITLG gene in humans is also regulatory. But this seems somewhat speculative. The SNP they use for admixture mapping is in a conserved element. However, admixture mapping (unlike association mapping) can usually only narrow a signal down to a pretty broad region, as it relies upon recombination events in the 10's of generations since Africans arrived in America. Thus it is unlikely that the authors can be sure that changes within the gene are not responsible for the affect on pigmentation.

Also I felt that the authors were slightly quick to rule out coding changes in the stickleback KITLG underlying functional divergence. Obviously the expression level has changed, but that does not mean that change is automatically functional. The change in expression is not complete reduction of expression, so it is unclear how the protein levels are affected. The marine and freshwater fish are also different at 2 amino acid sites within the KITLG gene. The authors argue that these sites are not strong candidates for functional changes as they occur at non-conserved sites. But I find that this contrasts with the authors' views that regulatory evolution is a common path for genes with high pleiotropic effects because 'only mutations that may be compatible with viability and fitness may be regulatory mutations'. But the KITLG protein sequences of the freshwater and ocean stickleback do differ, and these changes (perhaps in combination with the regulatory changes) might underlie the changes in pigmentation. It seems slightly premature to suggest that this gene is a solid case for regulatory evolution (and the authors do acknowledge that).

Also the authors' own data suggests that this regulatory change in stickleback appears to have pleiotropic effects. The change in KITLG expression level in freshwater stickleback lowers expression in the gills (and the ventral skin) but also ups transcription in the brain. While the change in regulation in multiple tissues could be the result of multiple mutations, a parsimonious explanation is that a change to a single cis regulatory module causes these changes in different tissues. Thus while it is appealing that cis regulatory modules free up gene expression from the pleiotropy that coding changes might suffer, it is premature to conclude that such changes are easier to select upon than coding changes (especially as the gene in question has coding changes).

As I say I really like the paper, but felt that it slightly oversells the regulatory evolution aspect, which is interesting in light of the recent debate on the evidence for regulatory evolution (see here ).

Miller CT, Beleza S, Pollen AA, Schluter D, Kittles RA, Shriver MD, Kingsley DM.
cis-Regulatory changes in Kit ligand expression and parallel evolution of pigmentation in sticklebacks and humans.
Cell. 2007 Dec 14;131(6):1179-89

Thursday, January 24, 2008

More on the pacific islanders

Thought I would pop a couple of pictures up from the Pacific island paper , and briefly point out another paper on the topic. One of the debates that the paper sought to resolve is "Are Polynesians more closely related to Asian/Taiwanese populations or to Melanesians?"

First a picture of the area (taken and cropped from the PLoS genetics paper, citation at the bottom), as I had trouble remembering what populations are where

the Polynesian/Micronesian populations (on this map) are the Maori, Samoans and the Micronesians. Previous mtDNA work had suggested that the Polynesians were closely related to each other, supporting the proposal of a 'express-train to Polynesia', that these people moved rapidly (by boat) out from Taiwan, around the surrounding islands (with little contact with the people of these islands) before reaching their current locations. Others had suggested on the basis of the Y chromosome a 'slow boat to Polynesia', with the Polynesian populations mixing with Melanesians along the way.

Here's a pictures (from the paper) of the output of the program Structure (the top panel is individual ancestry, the bottom panel is ancestry averaged in within each population (mainly to aid the eye)). The Structure analysis is in this case constructing the Micro/Polynesian individuals (the last columns) as mixtures of the 8 populations (solid colors, purple Taiwanese, turquoise East Asian, Green: Kuot (Papua New Guinea).)

I've shown this picture (from the supplementary files of the paper) rather than one from the main text (shown just below) as it more clearly shows (well apart from the blurriness, sorry) that Polynesians are more closely related to aboriginal Taiwanese (and not the rest of Asia) with only a fraction of Melanesian admixture. There are also trees in the paper (constructed from Fst) that support the conclusions of this picture (they also show that the Maori seem to have been through quite a bottleneck).

another paper (in ASHG this time) looking at a large number of microsatellites in Polynesians (also with data from Melanesians and Han Chinese). This paper also estimates that while the Polynesians are closer to Mainland Asians (Han in their sample) than to Melanesians, Melanesians made a substantial genetic contributions to the Polynesians. They perform this analysis in a more formal population genetics setting than the PLoS genetics, to estimate the ancestral proportions of Polynesian population contributed by the two parental populations (Han and Melanesian) in a framework that allows for genetic drift. This is in agreement with the PLoS genetics paper, though I do wonder about the use of the Han as the parental population, as the results of the PLoS genetics paper indicates that aboriginal Taiwanese are closer to the Polynesians than the rest of Asia.

Friedlaender JS, Friedlaender FR, Reed FA, Kidd KK, Kidd JR, Chambers GK, Lea RA, Loo JH, Koki G, Hodgson JA, Merriwether DA, Weber JL.
Free in PMC
The Genetic Structure of Pacific Islanders.
PLoS Genet. 2008 Jan 18;4(1):e19

Sunday, January 20, 2008

expressing one's self

For most genes in our genome both maternal and paternal copies are expressed. One notable exception from this is the X chromosome in females. To compensate for the fact that males have only a single X chromosome, female cells compensate by expressing one X chromosome (at least for the majority of genes on the X) by inactivating the other X chromosome. The choice of which X chromosome to inactivate (maternal or paternal) is random (in most mammals), and this choice is made early on in development. Daughter cells in an organism inherit the choice (from the progenitor cell) of silenced X chromosome. This why calico cats are calico (who are all female, apart from rare XXY male calicos), the progenitor cells of different patches of cells have chosen different X's to express resulting in different colors.

There are other genes on the autosomes which express only one copy of the gene (mono-allelic expression). Some of these genes are imprinted, in that the decision of which copy to inactivate is not random but determined by the parent of origin, for example there are a set of genes for which only the copy inherited from the mother is expressed. However, for a number of autosomal genes the choice of which copy to express is random. Many of these randomly inactivated genes reside in particular gene families, e.g. olfactory receptor genes and antigen-specific receptors.

There's a paper out in Science that sets out to identify novel mono-allelic expression genes. Cell lines (immortalized cells) are usually poly-clonal (i.e. not derived from a single cell) and so the signal of random inactivation would be lost due to averaging over the different choices made by different clones. To overcome this the authors created clonal cell lines from single cells, thus the choice of which copy of the gene to inactivate is the same for all of the cells in the cell line.

The authors then used SNP genotyping chips to study RNA expression in these cell lines (a really neat idea). Usually SNP genotyping chips are used to detect what alleles an individual's DNA carries at 500 thousand of SNPs. Imagine an individual who is a heterozygote at a SNP within a gene, the RNA transcripts are produced by transcribing from the DNA and incorporate at random one or other of the alleles at the SNP, such that the RNA transcript from this gene averages out to 50% one allele and 50% the other allele. For mono-allelic expressed genes, only one of the copies of the gene is being expressed and so only one of the alleles is present in the RNA transcripts from the gene. By converting the RNA from there cell lines in to DNA and typing this DNA on the SNP chip, the authors can detect genes where only one of alleles in a heterozygote was being expressed in the RNA. The authors look for genes where the choice of allele expressed flipped in the different cell lines, implying that the different individuals (cell lines) are randomly choosing which alleles to express.

Now this approach is limited, as only genes which have SNPs within them which happen to be heterozygous in the cell lines can be informative. Thus the authors can study only 4000 genes. But they find that nearly 10% of these genes are showing mono-allelic expression. If correct, this is pretty stunning finding as it implies that around 10% of genes in the genome are expressed in a mono-allelic manner. These mono-allelic expression genes are often involved in interactions between cells. The authors also find that many of these mono-allelic expression genes are not 'perfectly' expressing only one of the alleles, many of the genes in some cell lines express both alleles.

How this expression is co-ordinated between the two copies of the gene is unknown, clearly there must be a set of diverse mechanisms that chose which copy to express. Starting to understanding these different mechanisms is bound to lead to some really interesting biology. The authors note that unlike the X chromosome inactivation this is not a chromosome-wide choice of which copy of the chromosome to express, as they see genes that are next to one another expressing the copy of the gene from different chromosomes.

I'm interested in how and why such expression mechanisms evolve. It would be great to see some comparative work in macaque (and/or chimp), showing whether mono-allelic expression of particular genes is conserved or if this is just a temporary evolutionary state for many genes. I'm also interested in the selective cost of mono-allelic expression. If only a single copy of the gene is expressed then the gene is essentially haploid, unmasking recessive deleterious mutations within it. Now this is perhaps mostly compensated for by the fact that different cell lineages make different choices of which allele to express, thus the individual (who is a mosaic of these choices) might usually not be affected by the deleterious mutations, as the cells in which non-deleterious mutation is expressed can compensate. But some mutations will presumably be deleterious enough that once unmasked they can not be compensated for (e.g. gain of function mutations). It is therefore somewhat surprising that the organism is forgoing one of the main supposed advantages of diploidy (the shielding of recessive mutations, see here).

What advantage could an organism derive from this mono-allelic expression? Mono-allelic expression can be involved in creating cellular diversity. For example, only a single ofactory receptor gene of the family ~1,000 olfactory receptors is expressed in a given neuron, thus each neuron has only a single olfactory receptor. Therefore, the expression of only a single copy of an olfactory receptor gene might be a side product of switching off all but one olfactory receptor gene (see here).
Given the sheer number of mono-allelic expressed genes (many imperfectly so), suggests that this is perhaps not what is happening here. One idea is that this the mono-allelic expression is a way of simply controlling (i.e. reducing) gene expression. It will be very interesting to see what comes of further investigations of this kind.

Thursday, January 17, 2008

genetic structure of Pacific island populations

A new paper looking at the genetic structure of Pacific Islanders has just come out. The authors type nearly a thousand markers on 952 individuals from 41 Pacific populations. I've not had a chance to read the paper in any depth, but it looks really interesting. Studies of single loci such as the Y or mtDNA offer a very noisy view of human history as chance events in the history of the maternal or paternal line can distort the view of history that the Y or mtDNA give us. This kind of study (with many unlinked markers) offers a huge advantage over mtDNA or Y chromosome studies as it represents a truly genomic view of population history. This paper also nicely ties in with the recent paper on native american ancestry (using the same set of microsatellite markers) . Combined with the spate of papers on finer scale structure within continents (e.g. here and here ) this is a great time for learning about the genetic history of populations.

Tuesday, January 1, 2008

RE Another comment on Hawks et al

Thanks for commenting John.
I think that people do not doubt that the effective population size of humans has increased, what is debatable is when and by how much.

I stand by my comment that effective population sizes can not be estimated from archaeological data. The only way to truly estimate the population genetic parameter Ne is from population genetic data. It is not enough to do some calculations to suggest that Ne could only be some fraction smaller than the census size. Any such calculation can only take approximate account of some of the many factors contributing towards the reduction in Ne from the census size. The effective population size of many large cosmopolitan species is often far, far smaller than anyone would have predicted (which is why people don't predict it, they estimate it). Archaeological data (and simulations based on them) are of use in trying to understand what factors may have played a role in the reduction of the effective population size, but can not be taken as a proxy for it.

I would be interested to know the recent references that show conclusively that Ne increased rapidly around 30k-50kys ago. I think that this is actually a controversial point, as many recent analyses of population genetic data do not support long-term growth (even in African populations where the signal is not complicated by the bottleneck), most analyzes reject strong growth occurring 30-50kyrs (see here , here , here and here ). African populations (Bantu) show a signals of moderate recent growth, i..e. an excess of singletons, but not enough to be compatible with strong long term growth. European populations will probably also show signals of very recent growth, once resequencing nuclear DNA sample sizes are large enough to find the signal in very low frequency polymorphisms (that have been recently introduced). Now undoubted these papers suffer from flaws in their assumptions, but they suggest that you can not assume that the effective population size was increasing dramatically 10's thousand of years ago.

The reason (I think) why a number of analyses ignore recent growth (and use a constant population size of 10,000) is not that the authors do not appreciate that effective population sizes have increased, it is just that the increase is thought to be too recent ( i.e. not 10's thousands of years ago) to affect the estimates of bottlenecks or such.