Thursday, January 24, 2008

More on the pacific islanders

Thought I would pop a couple of pictures up from the Pacific island paper , and briefly point out another paper on the topic. One of the debates that the paper sought to resolve is "Are Polynesians more closely related to Asian/Taiwanese populations or to Melanesians?"

First a picture of the area (taken and cropped from the PLoS genetics paper, citation at the bottom), as I had trouble remembering what populations are where

the Polynesian/Micronesian populations (on this map) are the Maori, Samoans and the Micronesians. Previous mtDNA work had suggested that the Polynesians were closely related to each other, supporting the proposal of a 'express-train to Polynesia', that these people moved rapidly (by boat) out from Taiwan, around the surrounding islands (with little contact with the people of these islands) before reaching their current locations. Others had suggested on the basis of the Y chromosome a 'slow boat to Polynesia', with the Polynesian populations mixing with Melanesians along the way.

Here's a pictures (from the paper) of the output of the program Structure (the top panel is individual ancestry, the bottom panel is ancestry averaged in within each population (mainly to aid the eye)). The Structure analysis is in this case constructing the Micro/Polynesian individuals (the last columns) as mixtures of the 8 populations (solid colors, purple Taiwanese, turquoise East Asian, Green: Kuot (Papua New Guinea).)

I've shown this picture (from the supplementary files of the paper) rather than one from the main text (shown just below) as it more clearly shows (well apart from the blurriness, sorry) that Polynesians are more closely related to aboriginal Taiwanese (and not the rest of Asia) with only a fraction of Melanesian admixture. There are also trees in the paper (constructed from Fst) that support the conclusions of this picture (they also show that the Maori seem to have been through quite a bottleneck).

another paper (in ASHG this time) looking at a large number of microsatellites in Polynesians (also with data from Melanesians and Han Chinese). This paper also estimates that while the Polynesians are closer to Mainland Asians (Han in their sample) than to Melanesians, Melanesians made a substantial genetic contributions to the Polynesians. They perform this analysis in a more formal population genetics setting than the PLoS genetics, to estimate the ancestral proportions of Polynesian population contributed by the two parental populations (Han and Melanesian) in a framework that allows for genetic drift. This is in agreement with the PLoS genetics paper, though I do wonder about the use of the Han as the parental population, as the results of the PLoS genetics paper indicates that aboriginal Taiwanese are closer to the Polynesians than the rest of Asia.

Friedlaender JS, Friedlaender FR, Reed FA, Kidd KK, Kidd JR, Chambers GK, Lea RA, Loo JH, Koki G, Hodgson JA, Merriwether DA, Weber JL.
Free in PMC
The Genetic Structure of Pacific Islanders.
PLoS Genet. 2008 Jan 18;4(1):e19

Sunday, January 20, 2008

expressing one's self

For most genes in our genome both maternal and paternal copies are expressed. One notable exception from this is the X chromosome in females. To compensate for the fact that males have only a single X chromosome, female cells compensate by expressing one X chromosome (at least for the majority of genes on the X) by inactivating the other X chromosome. The choice of which X chromosome to inactivate (maternal or paternal) is random (in most mammals), and this choice is made early on in development. Daughter cells in an organism inherit the choice (from the progenitor cell) of silenced X chromosome. This why calico cats are calico (who are all female, apart from rare XXY male calicos), the progenitor cells of different patches of cells have chosen different X's to express resulting in different colors.

There are other genes on the autosomes which express only one copy of the gene (mono-allelic expression). Some of these genes are imprinted, in that the decision of which copy to inactivate is not random but determined by the parent of origin, for example there are a set of genes for which only the copy inherited from the mother is expressed. However, for a number of autosomal genes the choice of which copy to express is random. Many of these randomly inactivated genes reside in particular gene families, e.g. olfactory receptor genes and antigen-specific receptors.

There's a paper out in Science that sets out to identify novel mono-allelic expression genes. Cell lines (immortalized cells) are usually poly-clonal (i.e. not derived from a single cell) and so the signal of random inactivation would be lost due to averaging over the different choices made by different clones. To overcome this the authors created clonal cell lines from single cells, thus the choice of which copy of the gene to inactivate is the same for all of the cells in the cell line.

The authors then used SNP genotyping chips to study RNA expression in these cell lines (a really neat idea). Usually SNP genotyping chips are used to detect what alleles an individual's DNA carries at 500 thousand of SNPs. Imagine an individual who is a heterozygote at a SNP within a gene, the RNA transcripts are produced by transcribing from the DNA and incorporate at random one or other of the alleles at the SNP, such that the RNA transcript from this gene averages out to 50% one allele and 50% the other allele. For mono-allelic expressed genes, only one of the copies of the gene is being expressed and so only one of the alleles is present in the RNA transcripts from the gene. By converting the RNA from there cell lines in to DNA and typing this DNA on the SNP chip, the authors can detect genes where only one of alleles in a heterozygote was being expressed in the RNA. The authors look for genes where the choice of allele expressed flipped in the different cell lines, implying that the different individuals (cell lines) are randomly choosing which alleles to express.

Now this approach is limited, as only genes which have SNPs within them which happen to be heterozygous in the cell lines can be informative. Thus the authors can study only 4000 genes. But they find that nearly 10% of these genes are showing mono-allelic expression. If correct, this is pretty stunning finding as it implies that around 10% of genes in the genome are expressed in a mono-allelic manner. These mono-allelic expression genes are often involved in interactions between cells. The authors also find that many of these mono-allelic expression genes are not 'perfectly' expressing only one of the alleles, many of the genes in some cell lines express both alleles.

How this expression is co-ordinated between the two copies of the gene is unknown, clearly there must be a set of diverse mechanisms that chose which copy to express. Starting to understanding these different mechanisms is bound to lead to some really interesting biology. The authors note that unlike the X chromosome inactivation this is not a chromosome-wide choice of which copy of the chromosome to express, as they see genes that are next to one another expressing the copy of the gene from different chromosomes.

I'm interested in how and why such expression mechanisms evolve. It would be great to see some comparative work in macaque (and/or chimp), showing whether mono-allelic expression of particular genes is conserved or if this is just a temporary evolutionary state for many genes. I'm also interested in the selective cost of mono-allelic expression. If only a single copy of the gene is expressed then the gene is essentially haploid, unmasking recessive deleterious mutations within it. Now this is perhaps mostly compensated for by the fact that different cell lineages make different choices of which allele to express, thus the individual (who is a mosaic of these choices) might usually not be affected by the deleterious mutations, as the cells in which non-deleterious mutation is expressed can compensate. But some mutations will presumably be deleterious enough that once unmasked they can not be compensated for (e.g. gain of function mutations). It is therefore somewhat surprising that the organism is forgoing one of the main supposed advantages of diploidy (the shielding of recessive mutations, see here).

What advantage could an organism derive from this mono-allelic expression? Mono-allelic expression can be involved in creating cellular diversity. For example, only a single ofactory receptor gene of the family ~1,000 olfactory receptors is expressed in a given neuron, thus each neuron has only a single olfactory receptor. Therefore, the expression of only a single copy of an olfactory receptor gene might be a side product of switching off all but one olfactory receptor gene (see here).
Given the sheer number of mono-allelic expressed genes (many imperfectly so), suggests that this is perhaps not what is happening here. One idea is that this the mono-allelic expression is a way of simply controlling (i.e. reducing) gene expression. It will be very interesting to see what comes of further investigations of this kind.

Thursday, January 17, 2008

genetic structure of Pacific island populations

A new paper looking at the genetic structure of Pacific Islanders has just come out. The authors type nearly a thousand markers on 952 individuals from 41 Pacific populations. I've not had a chance to read the paper in any depth, but it looks really interesting. Studies of single loci such as the Y or mtDNA offer a very noisy view of human history as chance events in the history of the maternal or paternal line can distort the view of history that the Y or mtDNA give us. This kind of study (with many unlinked markers) offers a huge advantage over mtDNA or Y chromosome studies as it represents a truly genomic view of population history. This paper also nicely ties in with the recent paper on native american ancestry (using the same set of microsatellite markers) . Combined with the spate of papers on finer scale structure within continents (e.g. here and here ) this is a great time for learning about the genetic history of populations.

Tuesday, January 1, 2008

RE Another comment on Hawks et al

Thanks for commenting John.
I think that people do not doubt that the effective population size of humans has increased, what is debatable is when and by how much.

I stand by my comment that effective population sizes can not be estimated from archaeological data. The only way to truly estimate the population genetic parameter Ne is from population genetic data. It is not enough to do some calculations to suggest that Ne could only be some fraction smaller than the census size. Any such calculation can only take approximate account of some of the many factors contributing towards the reduction in Ne from the census size. The effective population size of many large cosmopolitan species is often far, far smaller than anyone would have predicted (which is why people don't predict it, they estimate it). Archaeological data (and simulations based on them) are of use in trying to understand what factors may have played a role in the reduction of the effective population size, but can not be taken as a proxy for it.

I would be interested to know the recent references that show conclusively that Ne increased rapidly around 30k-50kys ago. I think that this is actually a controversial point, as many recent analyses of population genetic data do not support long-term growth (even in African populations where the signal is not complicated by the bottleneck), most analyzes reject strong growth occurring 30-50kyrs (see here , here , here and here ). African populations (Bantu) show a signals of moderate recent growth, i..e. an excess of singletons, but not enough to be compatible with strong long term growth. European populations will probably also show signals of very recent growth, once resequencing nuclear DNA sample sizes are large enough to find the signal in very low frequency polymorphisms (that have been recently introduced). Now undoubted these papers suffer from flaws in their assumptions, but they suggest that you can not assume that the effective population size was increasing dramatically 10's thousand of years ago.

The reason (I think) why a number of analyses ignore recent growth (and use a constant population size of 10,000) is not that the authors do not appreciate that effective population sizes have increased, it is just that the increase is thought to be too recent ( i.e. not 10's thousands of years ago) to affect the estimates of bottlenecks or such.