48 research outputs found

    An Efficient Bayesian Inference Framework for Coalescent-Based Nonparametric Phylodynamics

    Full text link
    Phylodynamics focuses on the problem of reconstructing past population size dynamics from current genetic samples taken from the population of interest. This technique has been extensively used in many areas of biology, but is particularly useful for studying the spread of quickly evolving infectious diseases agents, e.g.,\ influenza virus. Phylodynamics inference uses a coalescent model that defines a probability density for the genealogy of randomly sampled individuals from the population. When we assume that such a genealogy is known, the coalescent model, equipped with a Gaussian process prior on population size trajectory, allows for nonparametric Bayesian estimation of population size dynamics. While this approach is quite powerful, large data sets collected during infectious disease surveillance challenge the state-of-the-art of Bayesian phylodynamics and demand computationally more efficient inference framework. To satisfy this demand, we provide a computationally efficient Bayesian inference framework based on Hamiltonian Monte Carlo for coalescent process models. Moreover, we show that by splitting the Hamiltonian function we can further improve the efficiency of this approach. Using several simulated and real datasets, we show that our method provides accurate estimates of population size dynamics and is substantially faster than alternative methods based on elliptical slice sampler and Metropolis-adjusted Langevin algorithm

    Horseshoe-based Bayesian nonparametric estimation of effective population size trajectories

    Full text link
    Phylodynamics is an area of population genetics that uses genetic sequence data to estimate past population dynamics. Modern state-of-the-art Bayesian nonparametric methods for recovering population size trajectories of unknown form use either change-point models or Gaussian process priors. Change-point models suffer from computational issues when the number of change-points is unknown and needs to be estimated. Gaussian process-based methods lack local adaptivity and cannot accurately recover trajectories that exhibit features such as abrupt changes in trend or varying levels of smoothness. We propose a novel, locally-adaptive approach to Bayesian nonparametric phylodynamic inference that has the flexibility to accommodate a large class of functional behaviors. Local adaptivity results from modeling the log-transformed effective population size a priori as a horseshoe Markov random field, a recently proposed statistical model that blends together the best properties of the change-point and Gaussian process modeling paradigms. We use simulated data to assess model performance, and find that our proposed method results in reduced bias and increased precision when compared to contemporary methods. We also use our models to reconstruct past changes in genetic diversity of human hepatitis C virus in Egypt and to estimate population size changes of ancient and modern steppe bison. These analyses show that our new method captures features of the population size trajectories that were missed by the state-of-the-art methods.Comment: 36 pages, including supplementary informatio

    Fitting stochastic epidemic models to gene genealogies using linear noise approximation

    Get PDF
    Phylodynamics is a set of population genetics tools that aim at reconstructing demographic history of a population based on molecular sequences of individuals sampled from the population of interest. One important task in phylodynamics is to estimate changes in (effective) population size. When applied to infectious disease sequences such estimation of population size trajectories can provide information about changes in the number of infections. To model changes in the number of infected individuals, current phylodynamic methods use non-parametric approaches, parametric approaches, and stochastic modeling in conjunction with likelihood-free Bayesian methods. The first class of methods yields results that are hard-to-interpret epidemiologically. The second class of methods provides estimates of important epidemiological parameters, such as infection and removal/recovery rates, but ignores variation in the dynamics of infectious disease spread. The third class of methods is the most advantageous statistically, but relies on computationally intensive particle filtering techniques that limits its applications. We propose a Bayesian model that combines phylodynamic inference and stochastic epidemic models, and achieves computational tractability by using a linear noise approximation (LNA) --- a technique that allows us to approximate probability densities of stochastic epidemic model trajectories. LNA opens the door for using modern Markov chain Monte Carlo tools to approximate the joint posterior distribution of the disease transmission parameters and of high dimensional vectors describing unobserved changes in the stochastic epidemic model compartment sizes (e.g., numbers of infectious and susceptible individuals). We apply our estimation technique to Ebola genealogies estimated using viral genetic data from the 2014 epidemic in Sierra Leone and Liberia.Comment: 43 pages, 6 figures in the main tex

    Understanding past population dynamics: Bayesian coalescent-based modeling with covariates

    Get PDF
    Effective population size characterizes the genetic variability in a population and is a parameter of paramount importance in population genetics. Kingman's coalescent process enables inference of past population dynamics directly from molecular sequence data, and researchers have developed a number of flexible coalescent-based models for Bayesian nonparametric estimation of the effective population size as a function of time. A major goal of demographic reconstruction is understanding the association between the effective population size and potential explanatory factors. Building upon Bayesian nonparametric coalescent-based approaches, we introduce a flexible framework that incorporates time-varying covariates through Gaussian Markov random fields. To approximate the posterior distribution, we adapt efficient Markov chain Monte Carlo algorithms designed for highly structured Gaussian models. Incorporating covariates into the demographic inference framework enables the modeling of associations between the effective population size and covariates while accounting for uncertainty in population histories. Furthermore, it can lead to more precise estimates of population dynamics. We apply our model to four examples. We reconstruct the demographic history of raccoon rabies in North America and find a significant association with the spatiotemporal spread of the outbreak. Next, we examine the effective population size trajectory of the DENV-4 virus in Puerto Rico along with viral isolate count data and find similar cyclic patterns. We compare the population history of the HIV-1 CRF02_AG clade in Cameroon with HIV incidence and prevalence data and find that the effective population size is more reflective of incidence rate. Finally, we explore the hypothesis that the population dynamics of musk ox during the Late Quaternary period were related to climate change

    Inference for Nonlinear Epidemiological Models Using Genealogies and Time Series

    Get PDF
    Phylodynamics - the field aiming to quantitatively integrate the ecological and evolutionary dynamics of rapidly evolving populations like those of RNA viruses – increasingly relies upon coalescent approaches to infer past population dynamics from reconstructed genealogies. As sequence data have become more abundant, these approaches are beginning to be used on populations undergoing rapid and rather complex dynamics. In such cases, the simple demographic models that current phylodynamic methods employ can be limiting. First, these models are not ideal for yielding biological insight into the processes that drive the dynamics of the populations of interest. Second, these models differ in form from mechanistic and often stochastic population dynamic models that are currently widely used when fitting models to time series data. As such, their use does not allow for both genealogical data and time series data to be considered in tandem when conducting inference. Here, we present a flexible statistical framework for phylodynamic inference that goes beyond these current limitations. The framework we present employs a recently developed method known as particle MCMC to fit stochastic, nonlinear mechanistic models for complex population dynamics to gene genealogies and time series data in a Bayesian framework. We demonstrate our approach using a nonlinear Susceptible-Infected-Recovered (SIR) model for the transmission dynamics of an infectious disease and show through simulations that it provides accurate estimates of past disease dynamics and key epidemiological parameters from genealogies with or without accompanying time series data

    Locally adaptive Bayesian birth-death model successfully detects slow and rapid rate shifts

    Get PDF
    Birth-death processes have given biologists a model-based framework to answer questions about changes in the birth and death rates of lineages in a phylogenetic tree. Therefore birth-death models are central to macroevolutionary as well as phylodynamic analyses. Early approaches to studying temporal variation in birth and death rates using birth-death models faced difficulties due to the restrictive choices of birth and death rate curves through time. Sufficiently flexible time-varying birth-death models are still lacking. We use a piecewise-constant birth-death model, combined with both Gaussian Markov random field (GMRF) and horseshoe Markov random field (HSMRF) prior distributions, to approximate arbitrary changes in birth rate through time. We implement these models in the widely used statistical phylogenetic software platform RevBayes, allowing us to jointly estimate birth-death process parameters, phylogeny, and nuisance parameters in a Bayesian framework. We test both GMRF-based and HSMRF-based models on a variety of simulated diversification scenarios, and then apply them to both a macroevolutionary and an epidemiological dataset. We find that both models are capable of inferring variable birth rates and correctly rejecting variable models in favor of effectively constant models. In general the HSMRF-based model has higher precision than its GMRF counterpart, with little to no loss of accuracy. Applied to a macroevolutionary dataset of the Australian gecko family Pygopodidae (where birth rates are interpretable as speciation rates), the GMRF-based model detects a slow decrease whereas the HSMRF-based model detects a rapid speciation-rate decrease in the last 12 million years. Applied to an infectious disease phylodynamic dataset of sequences from HIV subtype A in Russia and Ukraine (where birth rates are interpretable as the rate of accumulation of new infections), our models detect a strongly elevated rate of infection in the 1990s. Author summary Both the growth of groups of species and the spread of infectious diseases through populations can be modeled as birth-death processes. Birth events correspond either to speciation or infection, and death events to extinction or becoming noninfectious. The rates of birth and death may vary over time, and by examining this variation researchers can pinpoint important events in the history of life on Earth or in the course of an outbreak. Time-calibrated phylogenies track the relationships between a set of species (or infections) and the times of all speciation (or infection) events, and can thus be used to infer birth and death rates. We develop two phylogenetic birth-death models with the goal of discerning signal of rate variation from noise due to the stochastic nature of birth-death models. Using a variety of simulated datasets, we show that one of these models can accurately infer slow and rapid rate shifts without sacrificing precision. Using real data, we demonstrate that our new methodology can be used for simultaneous inference of phylogeny and rates through time

    The effects of sampling strategy on the quality of reconstruction of viral population dynamics using Bayesian skyline family coalescent methods:A simulation study

    Get PDF
    The ongoing large-scale increase in the total amount of genetic data for viruses and other pathogens has led to a situation in which it is often not possible to include every available sequence in a phylogenetic analysis and expect the procedure to complete in reasonable computational time. This raises questions about how a set of sequences should be selected for analysis, particularly if the data are used to infer more than just the phylogenetic tree itself. The design of sampling strategies for molecular epidemiology has been a neglected field of research. This article describes a large-scale simulation exercise that was undertaken to select an appropriate strategy when using the GMRF skygrid, one of the Bayesian skyline family of coalescent methods, in order to reconstruct past population dynamics. The simulated scenarios were intended to represent sampling for the population of an endemic virus across multiple geographical locations. Large phylogenies were simulated under a coalescent or structured coalescent model and sequences simulated from these trees; the resulting datasets were then downsampled for analyses according to a variety of schemes. Variation in results between different replicates of the same scheme was not insignificant, and as a result, we recommend that where possible analyses are repeated with different datasets in order to establish that elements of a reconstruction are not simply the result of the particular set of samples selected. We show that an individual stochastic choice of sequences can introduce spurious behaviour in the median line of the skygrid plot and that even marginal likelihood estimation can suggest complicated dynamics that were not in fact present. We recommend that the median line should not be used to infer historical events on its own. Sampling sequences with uniform probability with respect to both time and spatial location (deme) never performed worse than sampling with probability proportional to the effective population size at that time and in that location and frequently was superior. As a result, we recommend this approach in the design of future studies. We also confirm that the inclusion of many recent sequences from a single geographical location in an analysis tends to result in a spurious bottleneck effect in the reconstruction and caution against interpreting this as genuine

    Phylodynamic modelling of foot-and-mouth disease virus sequence data

    Get PDF
    The under-reporting of cases of infectious diseases is a substantial impediment to the control and management of infectious diseases in both epidemic and endemic contexts. Information about infectious disease dynamics can be recovered from sequence data using time-varying coalescent approaches, and phylodynamic models have been developed in order to reconstruct demographic changes of the numbers of infected hosts through time. In this study I have demonstrated the general concordance between empirically observed epidemiological incidence data and viral demography inferred through analysis of foot-and-mouth disease virus VP1 coding sequences belonging to the CATHAY topotype over large temporal and spatial scales. However a more precise and robust relationship between the effective population size

    Phylodynamic Methods for Infectious Disease Epidemiology

    Get PDF
    <p>In this dissertation, I present a general statistical framework for phylodynamic inference that can be used to estimate epidemiological parameters and reconstruct disease dynamics from pathogen genealogies. This framework can be used to fit a broad class of epidemiological models, including nonlinear stochastic models, to genealogies by relating the population dynamics of a pathogen to its genealogy using coalescent theory. By combining Markov chain Monte Carlo and particle filtering methods, efficient Bayesian inference of all parameters and unobserved latent variables is possible even when analytical likelihood expressions are not available under the epidemiological model. Through extensive simulations, I show that this method can be used to reliably estimate epidemiological parameters of interest as well as reconstruct past disease dynamics from genealogies, or jointly from genealogies and other common sources of epidemiological data like time series. I then extend this basic framework to include different types of host population structure, including models with spatial structure, multiple-hosts or vectors, and different stages of infection. The later is demonstrated by using a multistage model of HIV infection to estimate stage-specific transmission rates and incidence from HIV sequence data collected in Detroit, Michigan. Finally, to demonstrate how the approach can be used more generally, I consider the case of dengue virus in southern Vietnam. I show how earlier phylodynamic inference methods fail to reliably reconstruct the dynamics of dengue observed in hospitalization data, but by deriving coalescent models that take into consideration ecological complexities like seasonality, vector dynamics and spatial structure, accurate dynamics can be reconstructed from genealogies. In sum, by extending phylodynamics to include more ecologically realistic and mechanistic models, this framework can provide more accurate estimates and give deeper insight into the processes driving infectious disease dynamics.</p>Dissertatio
    corecore