3,419 research outputs found

    Coalescent simulation in continuous space

    Get PDF
    Coalescent simulation has become an indispensable tool in population genetics and many complex evolutionary scenarios have been incorporated into the basic algorithm. Despite many years of intense interest in spatial structure, however, there are no available methods to simulate the ancestry of a sample of genes that occupy a spatial continuum. This is mainly due to the severe technical problems encountered by the classical model of isolation by distance. A recently introduced model solves these technical problems and provides a solid theoretical basis for the study of populations evolving in continuous space. We present a detailed algorithm to simulate the coalescent process in this model, and provide an efficient implementation of a generalised version of this algorithm as a freely available Python module

    Coalescent simulation in continuous space:Algorithms for large neighbourhood size

    Get PDF
    Many species have an essentially continuous distribution in space, in which there are no natural divisions between randomly mating subpopulations. Yet, the standard approach to modelling these populations is to impose an arbitrary grid of demes, adjusting deme sizes and migration rates in an attempt to capture the important features of the population. Such indirect methods are required because of the failure of the classical models of isolation by distance, which have been shown to have major technical flaws. A recently introduced model of extinction and recolonisation in two dimensions solves these technical problems, and provides a rigorous technical foundation for the study of populations evolving in a spatial continuum. The coalescent process for this model is simply stated, but direct simulation is very inefficient for large neighbourhood sizes. We present efficient and exact algorithms to simulate this coalescent process for arbitrary sample sizes and numbers of loci, and analyse these algorithms in detail

    How to Couple from the Past Using a Read-Once Source of Randomness

    Full text link
    We give a new method for generating perfectly random samples from the stationary distribution of a Markov chain. The method is related to coupling from the past (CFTP), but only runs the Markov chain forwards in time, and never restarts it at previous times in the past. The method is also related to an idea known as PASTA (Poisson arrivals see time averages) in the operations research literature. Because the new algorithm can be run using a read-once stream of randomness, we call it read-once CFTP. The memory and time requirements of read-once CFTP are on par with the requirements of the usual form of CFTP, and for a variety of applications the requirements may be noticeably less. Some perfect sampling algorithms for point processes are based on an extension of CFTP known as coupling into and from the past; for completeness, we give a read-once version of coupling into and from the past, but it remains unpractical. For these point process applications, we give an alternative coupling method with which read-once CFTP may be efficiently used.Comment: 28 pages, 2 figure

    A Likelihood-Free Inference Framework for Population Genetic Data using Exchangeable Neural Networks

    Full text link
    An explosion of high-throughput DNA sequencing in the past decade has led to a surge of interest in population-scale inference with whole-genome data. Recent work in population genetics has centered on designing inference methods for relatively simple model classes, and few scalable general-purpose inference techniques exist for more realistic, complex models. To achieve this, two inferential challenges need to be addressed: (1) population data are exchangeable, calling for methods that efficiently exploit the symmetries of the data, and (2) computing likelihoods is intractable as it requires integrating over a set of correlated, extremely high-dimensional latent variables. These challenges are traditionally tackled by likelihood-free methods that use scientific simulators to generate datasets and reduce them to hand-designed, permutation-invariant summary statistics, often leading to inaccurate inference. In this work, we develop an exchangeable neural network that performs summary statistic-free, likelihood-free inference. Our framework can be applied in a black-box fashion across a variety of simulation-based tasks, both within and outside biology. We demonstrate the power of our approach on the recombination hotspot testing problem, outperforming the state-of-the-art.Comment: 9 pages, 8 figure

    Inference of Population History using Coalescent HMMs: Review and Outlook

    Full text link
    Studying how diverse human populations are related is of historical and anthropological interest, in addition to providing a realistic null model for testing for signatures of natural selection or disease associations. Furthermore, understanding the demographic histories of other species is playing an increasingly important role in conservation genetics. A number of statistical methods have been developed to infer population demographic histories using whole-genome sequence data, with recent advances focusing on allowing for more flexible modeling choices, scaling to larger data sets, and increasing statistical power. Here we review coalescent hidden Markov models, a powerful class of population genetic inference methods that can effectively utilize linkage disequilibrium information. We highlight recent advances, give advice for practitioners, point out potential pitfalls, and present possible future research directions.Comment: 12 pages, 2 figure

    Horseshoe-based Bayesian nonparametric estimation of effective population size trajectories

    Full text link
    Phylodynamics is an area of population genetics that uses genetic sequence data to estimate past population dynamics. Modern state-of-the-art Bayesian nonparametric methods for recovering population size trajectories of unknown form use either change-point models or Gaussian process priors. Change-point models suffer from computational issues when the number of change-points is unknown and needs to be estimated. Gaussian process-based methods lack local adaptivity and cannot accurately recover trajectories that exhibit features such as abrupt changes in trend or varying levels of smoothness. We propose a novel, locally-adaptive approach to Bayesian nonparametric phylodynamic inference that has the flexibility to accommodate a large class of functional behaviors. Local adaptivity results from modeling the log-transformed effective population size a priori as a horseshoe Markov random field, a recently proposed statistical model that blends together the best properties of the change-point and Gaussian process modeling paradigms. We use simulated data to assess model performance, and find that our proposed method results in reduced bias and increased precision when compared to contemporary methods. We also use our models to reconstruct past changes in genetic diversity of human hepatitis C virus in Egypt and to estimate population size changes of ancient and modern steppe bison. These analyses show that our new method captures features of the population size trajectories that were missed by the state-of-the-art methods.Comment: 36 pages, including supplementary informatio
    corecore