106 research outputs found
Decoding coalescent hidden Markov models in linear time
In many areas of computational biology, hidden Markov models (HMMs) have been
used to model local genomic features. In particular, coalescent HMMs have been
used to infer ancient population sizes, migration rates, divergence times, and
other parameters such as mutation and recombination rates. As more loci,
sequences, and hidden states are added to the model, however, the runtime of
coalescent HMMs can quickly become prohibitive. Here we present a new algorithm
for reducing the runtime of coalescent HMMs from quadratic in the number of
hidden time states to linear, without making any additional approximations. Our
algorithm can be incorporated into various coalescent HMMs, including the
popular method PSMC for inferring variable effective population sizes. Here we
implement this algorithm to speed up our demographic inference method diCal,
which is equivalent to PSMC when applied to a sample of two haplotypes. We
demonstrate that the linear-time method can reconstruct a population size
change history more accurately than the quadratic-time method, given similar
computation resources. We also apply the method to data from the 1000 Genomes
project, inferring a high-resolution history of size changes in the European
population.Comment: 18 pages, 5 figures. To appear in the Proceedings of the 18th Annual
International Conference on Research in Computational Molecular Biology
(RECOMB 2014). The final publication is available at link.springer.co
Inference of Population History using Coalescent HMMs: Review and Outlook
Studying how diverse human populations are related is of historical and
anthropological interest, in addition to providing a realistic null model for
testing for signatures of natural selection or disease associations.
Furthermore, understanding the demographic histories of other species is
playing an increasingly important role in conservation genetics. A number of
statistical methods have been developed to infer population demographic
histories using whole-genome sequence data, with recent advances focusing on
allowing for more flexible modeling choices, scaling to larger data sets, and
increasing statistical power. Here we review coalescent hidden Markov models, a
powerful class of population genetic inference methods that can effectively
utilize linkage disequilibrium information. We highlight recent advances, give
advice for practitioners, point out potential pitfalls, and present possible
future research directions.Comment: 12 pages, 2 figure
The landscape of nucleotide diversity in Drosophila melanogaster is shaped by mutation rate variation
What shapes the distribution of nucleotide diversity along the genome? Attempts to answer this question have sparked debate about the roles of neutral stochastic processes and natural selection in molecular evolution. However, the mechanisms of evolution do not act in isolation, and integrative models that simultaneously consider the influence of multiple factors on diversity are lacking; without them, confounding factors lurk in the estimates. Here we present a new statistical method that jointly infers the genomic landscapes of genealogies, recombination rates and mutation rates. In doing so, our model captures the effects of genetic drift, linked selection and local mutation rates on patterns of genomic variation. We then formalize a causal model of how these micro-evolutionary mechanisms interact, and cast it as a linear regression to estimate their individual contributions to levels of diversity along the genome. Our analyses reclaim the well-established signature of linked selection in Drosophila melanogaster, but we estimate that the mutation landscape is the major driver of the genome-wide distribution of diversity in this species. Furthermore, our simulation results suggest that in many evolutionary scenarios the mutation landscape will be a crucial factor shaping diversity, depending notably on the genomic window size. We argue that incorporating mutation rate variation into the null model of molecular evolution will lead to more realistic inferences in population genomics
Recommended from our members
Investigating the genetic diversity, population structure and archaic admixture history in worldwide human populations using high-coverage genomes
I present the analysis on 929 high-coverage (>30x) genomes from the Human Genome Diversity Project (HGDP) panel, a collection of cell lines from 54 populations across the world. Some data processing steps were necessary for downstream analysis, including lifting over resources on a different reference genome assembly, annotating the genome, and statistical phasing. Genome-wide genetic diversity conforms with previous studies using SNP arrays and microsatellites, yet haplotype information reveals fine scale structures and recent demographic history that vary between populations.
This dataset also provides a valuable opportunity to explore the diversity and distribution of archaic segments in modern human populations. I implemented a hidden Markov model to detect such segments, based on patterns of allele-sharing with sequenced archaic genomes and a sub-Saharan African control panel. I also compared several variants of the model and different training methods using simulated data. Applying the model on the HGDP dataset using two Neanderthal genomes and one Denisova genome, I detected variations in the level of archaic ancestry across continental regions, populations, and individuals within each population. I further compared Neanderthal and Denisovan segments regarding their lengths, genomic distribution, divergence to the archaic genomes, nucleotide diversity, and haplotype networks to shed light on the structure of the admixture events. Neanderthal segments from all non-African populations appear largely homogeneous after accounting for the recent demographic history of modern human populations, which is consistent with a single admixture event that happened before they diverged from each other. In contrast, a distinct separation exists between Denisovan haplotypes recovered from Oceania and those from East/South Asia, whilst the complicated structure in the latter cannot be explained by a single source of gene flow. Therefore I propose that more than one episode of admixture with different Denisova groups occurred in the ancestral population of present-day East Asian, South Asian and American populations after the separation from the ancestors of present-day Oceanians, and that a separate admixture event occurred between the ancestors of Oceanians and the Denisova population.Gates Cambridg
Inference of transitions to self-fertilization using haplotype genomic variation
Mating systems play an essential role in the evolution of natural populations. The reproductive mode of a population affects the evolutionary forces and recombination. Shifts in mating systems change major evolutionary traits of natural populations and affect the life-history cycle on many different levels. Among all transitions of mating schemes, a shift from outcrossing to selfing is one of the major shifts in plants. Such shifts have repeatedly occurred on the phylogenetic level. Despite their importance, there were no published tools to estimate such transitions in natural populations using genetic data on a genome- wide level. Existing estimates rely on estimating the loss-of-function mutations of causal loci. However, such estimates rely on the knowledge of the underlying genetic mechanism to induce the shift from outcrossing to selfing. Thus, such estimates are restricted to be conducted on very few species.
In this study, we investigated the genetic consequences of shifts from outcrossing to selfing (Chapter 1). We used extensive simulations of the forward- in-time Wright-Fisher model and the backward-in-time coalescent model. We found the previously described theoretical work on implementing partial selfing in the coalescent to suffice in simulating transitions to selfing. We developed an Approximate Bayesian Computation approach (tsABC) to identify and estimate the date of transitions from outcrossing to selfing using a pairwise comparison of genomes (Chapter2). Finally, in collaboration with Thibaut Sellinger, we introduced the modified PSMC’ (teSMC) to estimate piecewise-constant selfing rates through time jointly with piecewise-constant population sizes for single- population demographies and analyzed its accuracy (Chapter 3). Taken together, we provide not only an approximate Bayesian but also a maximum likelihood approach to identify and estimate transitions to selfing for single populations. We found tsABC to be a versatile tool to identify and estimate transitions to selfing. Under carefully made assumptions for the proposed models, transitions to selfing can be detected under a broad range of scenarios. Moreover, the assumed model in the teSMC method improved the estimates of demography and detected transitions to selfing at least as powerful as the tsABC. The automized parametrization of teSMC allows users with little expertise in scripting to use it.
We used both methods to estimate the transition from outcrossing to selfing for three genetic clusters of Arabidopsis thaliana. Our results were consistent with each other and existing estimates from the literature.
With our study, we not only contributed to the understanding of evolutionary processes that formed the genetic diversity of natural populations but also provided two powerful tools to investigate the demographic history of natural populations in the context of transitions to selfing. Recombination provides a molecular clock on a separate time scale compared to mutation that interacts with all the four evolutionary forces at various levels. Eventually, that will contribute to understanding the functions of genes and their relationship and interaction with the bearing individual, the population, and the environment. Taken together, selfing as a breeding scheme or reproductive strategy is a crucial trait that interferes and connects evolutionary forces, adaptive potential, and life- history traits of natural populations
Cancer evolution: mathematical models and computational inference.
Cancer is a somatic evolutionary process characterized by the accumulation of mutations, which contribute to tumor growth, clinical progression, immune escape, and drug resistance development. Evolutionary theory can be used to analyze the dynamics of tumor cell populations and to make inference about the evolutionary history of a tumor from molecular data. We review recent approaches to modeling the evolution of cancer, including population dynamics models of tumor initiation and progression, phylogenetic methods to model the evolutionary relationship between tumor subclones, and probabilistic graphical models to describe dependencies among mutations. Evolutionary modeling helps to understand how tumors arise and will also play an increasingly important prognostic role in predicting disease progression and the outcome of medical interventions, such as targeted therapy.FM would like to acknowledge the support of The University of Cambridge, Cancer Research UK and Hutchison Whampoa Limited.This is the final published version. It first appeared at http://sysbio.oxfordjournals.org/content/early/2014/10/07/sysbio.syu081.short?rss=1
- …