629 research outputs found
Gene-history correlation and population structure
Correlation of gene histories in the human genome determines the patterns of
genetic variation (haplotype structure) and is crucial to understanding genetic
factors in common diseases. We derive closed analytical expressions for the
correlation of gene histories in established demographic models for genetic
evolution and show how to extend the analysis to more realistic (but more
complicated) models of demographic structure. We identify two contributions to
the correlation of gene histories in divergent populations: linkage
disequilibrium, and differences in the demographic history of individuals in
the sample. These two factors contribute to correlations at different length
scales: the former at small, and the latter at large scales. We show that
recent mixing events in divergent populations limit the range of correlations
and compare our findings to empirical results on the correlation of gene
histories in the human genome.Comment: Revised and extended version: 26 pages, 5 figures, 1 tabl
The variant call format and VCFtools
Summary: The variant call format (VCF) is a generic format for storing DNA polymorphism data such as SNPs, insertions, deletions and structural variants, together with rich annotations. VCF is usually stored in a compressed manner and can be indexed for fast data retrieval of variants from a range of positions on the reference genome. The format was developed for the 1000 Genomes Project, and has also been adopted by other projects such as UK10K, dbSNP and the NHLBI Exome Project. VCFtools is a software suite that implements various utilities for processing VCF files, including validation, merging, comparing and also provides a general Perl API
Localizing Recent Adaptive Evolution in the Human Genome
Identifying genomic locations that have experienced selective sweeps is an important first step toward understanding the molecular basis of adaptive evolution. Using statistical methods that account for the confounding effects of population demography, recombination rate variation, and single-nucleotide polymorphism ascertainment, while also providing fine-scale estimates of the position of the selected site, we analyzed a genomic dataset of 1.2 million human single-nucleotide polymorphisms genotyped in African-American, European-American, and Chinese samples. We identify 101 regions of the human genome with very strong evidence (p < 10−5) of a recent selective sweep and where our estimate of the position of the selective sweep falls within 100 kb of a known gene. Within these regions, genes of biological interest include genes in pigmentation pathways, components of the dystrophin protein complex, clusters of olfactory receptors, genes involved in nervous system development and function, immune system genes, and heat shock genes. We also observe consistent evidence of selective sweeps in centromeric regions. In general, we find that recent adaptation is strikingly pervasive in the human genome, with as much as 10% of the genome affected by linkage to a selective sweep
Blood ties: ABO is a trans-species polymorphism in primates
The ABO histo-blood group, the critical determinant of transfusion
incompatibility, was the first genetic polymorphism discovered in humans.
Remarkably, ABO antigens are also polymorphic in many other primates, with the
same two amino acid changes responsible for A and B specificity in all species
sequenced to date. Whether this recurrence of A and B antigens is the result of
an ancient polymorphism maintained across species or due to numerous, more
recent instances of convergent evolution has been debated for decades, with a
current consensus in support of convergent evolution. We show instead that
genetic variation data in humans and gibbons as well as in Old World Monkeys
are inconsistent with a model of convergent evolution and support the
hypothesis of an ancient, multi-allelic polymorphism of which some alleles are
shared by descent among species. These results demonstrate that the ABO
polymorphism is a trans-species polymorphism among distantly related species
and has remained under balancing selection for tens of millions of years, to
date, the only such example in Hominoids and Old World Monkeys outside of the
Major Histocompatibility Complex.Comment: 45 pages, 4 Figures, 4 Supplementary Figures, 5 Supplementary Table
Genome-wide fine-scale recombination rate variation in Drosophila melanogaster
Estimating fine-scale recombination maps of Drosophila from population genomic data is a challenging problem, in particular because of the high background recombination rate. In this paper, a new computational method is developed to address this challenge. Through an extensive simulation study, it is demonstrated that the method allows more accurate inference, and exhibits greater robustness to the effects of natural selection and noise, compared to a well-used previous method developed for studying fine-scale recombination rate variation in the human genome. As an application, a genome-wide analysis of genetic variation data is performed for two Drosophila melanogaster populations, one from North America (Raleigh, USA) and the other from Africa (Gikongoro, Rwanda). It is shown that fine-scale recombination rate variation is widespread throughout the D. melanogaster genome, across all chromosomes and in both populations. At the fine-scale, a conservative, systematic search for evidence of recombination hotspots suggests the existence of a handful of putative hotspots each with at least a tenfold increase in intensity over the background rate. A wavelet analysis is carried out to compare the estimated recombination maps in the two populations and to quantify the extent to which recombination rates are conserved. In general, similarity is observed at very broad scales, but substantial differences are seen at fine scales. The average recombination rate of the X chromosome appears to be higher than that of the autosomes in both populations, and this pattern is much more pronounced in the African population than the North American population. The correlation between various genomic features—including recombination rates, diversity, divergence, GC content, gene content, and sequence quality—is examined using the wavelet analysis, and it is shown that the most notable difference between D. melanogaster and humans is in the correlation between recombination and diversity
Transplanting the leafy liverwort Herbertus hutchinsiae : A suitable conservation tool to maintain oceanic-montane liverwort-rich heath?
Thanks to the relevant landowners and managers for permission to carry out the experiments, Chris Preston for helping to obtain the liverwort distribution records and the distribution map, Gordon Rothero and Dave Horsfield for advice on choosing experimental sites and Alex Douglas for statistical advice. Juliane Geyer’s help with fieldwork was greatly appreciated. This study was made possible by a NERC PhD studentship and financial support from the Royal Botanic Garden Edinburgh and Scottish Natural Heritage.Peer reviewedPostprin
Whole-genome sequencing of spermatocytic tumors provides insights into the mutational processes operating in the male germline
Adult male germline stem cells (spermatogonia) proliferate by mitosis and, after puberty, generate spermatocytes that undertake meiosis to produce haploid spermatozoa. Germ cells are under evolutionary constraint to curtail mutations and maintain genome integrity. Despite constant turnover, spermatogonia very rarely form tumors, so-called spermatocytic tumors (SpT). In line with the previous identification of FGFR3 and HRAS selfish mutations in a subset of cases, candidate gene screening of 29 SpTs identified an oncogenic NRAS mutation in two cases. To gain insights in the etiology of SpT and into properties of the male germline, we performed whole-genome sequencing of five tumors (4/5 with matched normal tissue). The acquired single nucleotide variant load was extremely low (~0.2 per Mb), with an average of 6 (2±9) no
Recombination rate and selection strength in HIV intra-patient evolution
The evolutionary dynamics of HIV during the chronic phase of infection is
driven by the host immune response and by selective pressures exerted through
drug treatment. To understand and model the evolution of HIV quantitatively,
the parameters governing genetic diversification and the strength of selection
need to be known. While mutation rates can be measured in single replication
cycles, the relevant effective recombination rate depends on the probability of
coinfection of a cell with more than one virus and can only be inferred from
population data. However, most population genetic estimators for recombination
rates assume absence of selection and are hence of limited applicability to
HIV, since positive and purifying selection are important in HIV evolution.
Here, we estimate the rate of recombination and the distribution of selection
coefficients from time-resolved sequence data tracking the evolution of HIV
within single patients. By examining temporal changes in the genetic
composition of the population, we estimate the effective recombination to be
r=1.4e-5 recombinations per site and generation. Furthermore, we provide
evidence that selection coefficients of at least 15% of the observed
non-synonymous polymorphisms exceed 0.8% per generation. These results provide
a basis for a more detailed understanding of the evolution of HIV. A
particularly interesting case is evolution in response to drug treatment, where
recombination can facilitate the rapid acquisition of multiple resistance
mutations. With the methods developed here, more precise and more detailed
studies will be possible, as soon as data with higher time resolution and
greater sample sizes is available.Comment: to appear in PLoS Computational Biolog
The geography of recent genetic ancestry across Europe
The recent genealogical history of human populations is a complex mosaic
formed by individual migration, large-scale population movements, and other
demographic events. Population genomics datasets can provide a window into this
recent history, as rare traces of recent shared genetic ancestry are detectable
due to long segments of shared genomic material. We make use of genomic data
for 2,257 Europeans (the POPRES dataset) to conduct one of the first surveys of
recent genealogical ancestry over the past three thousand years at a
continental scale. We detected 1.9 million shared genomic segments, and used
the lengths of these to infer the distribution of shared ancestors across time
and geography. We find that a pair of modern Europeans living in neighboring
populations share around 10-50 genetic common ancestors from the last 1500
years, and upwards of 500 genetic ancestors from the previous 1000 years. These
numbers drop off exponentially with geographic distance, but since genetic
ancestry is rare, individuals from opposite ends of Europe are still expected
to share millions of common genealogical ancestors over the last 1000 years.
There is substantial regional variation in the number of shared genetic
ancestors: especially high numbers of common ancestors between many eastern
populations likely date to the Slavic and/or Hunnic expansions, while much
lower levels of common ancestry in the Italian and Iberian peninsulas may
indicate weaker demographic effects of Germanic expansions into these areas
and/or more stably structured populations. Recent shared ancestry in modern
Europeans is ubiquitous, and clearly shows the impact of both small-scale
migration and large historical events. Population genomic datasets have
considerable power to uncover recent demographic history, and will allow a much
fuller picture of the close genealogical kinship of individuals across the
world.Comment: Full size figures available from
http://www.eve.ucdavis.edu/~plralph/research.html; or html version at
http://ralphlab.usc.edu/ibd/ibd-paper/ibd-writeup.xhtm
Inference of population splits and mixtures from genome-wide allele frequency data
Many aspects of the historical relationships between populations in a species
are reflected in genetic data. Inferring these relationships from genetic data,
however, remains a challenging task. In this paper, we present a statistical
model for inferring the patterns of population splits and mixtures in multiple
populations. In this model, the sampled populations in a species are related to
their common ancestor through a graph of ancestral populations. Using
genome-wide allele frequency data and a Gaussian approximation to genetic
drift, we infer the structure of this graph. We applied this method to a set of
55 human populations and a set of 82 dog breeds and wild canids. In both
species, we show that a simple bifurcating tree does not fully describe the
data; in contrast, we infer many migration events. While some of the migration
events that we find have been detected previously, many have not. For example,
in the human data we infer that Cambodians trace approximately 16% of their
ancestry to a population ancestral to other extant East Asian populations. In
the dog data, we infer that both the boxer and basenji trace a considerable
fraction of their ancestry (9% and 25%, respectively) to wolves subsequent to
domestication, and that East Asian toy breeds (the Shih Tzu and the Pekingese)
result from admixture between modern toy breeds and "ancient" Asian breeds.
Software implementing the model described here, called TreeMix, is available at
http://treemix.googlecode.comComment: 28 pages, 6 figures in main text. Attached supplement is 22 pages, 15
figures. This is an updated version of the preprint available at
http://precedings.nature.com/documents/6956/version/
- …