1,382 research outputs found

    DM-PhyClus: A Bayesian phylogenetic algorithm for infectious disease transmission cluster inference

    Full text link
    Background. Conventional phylogenetic clustering approaches rely on arbitrary cutpoints applied a posteriori to phylogenetic estimates. Although in practice, Bayesian and bootstrap-based clustering tend to lead to similar estimates, they often produce conflicting measures of confidence in clusters. The current study proposes a new Bayesian phylogenetic clustering algorithm, which we refer to as DM-PhyClus, that identifies sets of sequences resulting from quick transmission chains, thus yielding easily-interpretable clusters, without using any ad hoc distance or confidence requirement. Results. Simulations reveal that DM-PhyClus can outperform conventional clustering methods, as well as the Gap procedure, a pure distance-based algorithm, in terms of mean cluster recovery. We apply DM-PhyClus to a sample of real HIV-1 sequences, producing a set of clusters whose inference is in line with the conclusions of a previous thorough analysis. Conclusions. DM-PhyClus, by eliminating the need for cutpoints and producing sensible inference for cluster configurations, can facilitate transmission cluster detection. Future efforts to reduce incidence of infectious diseases, like HIV-1, will need reliable estimates of transmission clusters. It follows that algorithms like DM-PhyClus could serve to better inform public health strategies

    Evolutionary Inference via the Poisson Indel Process

    Full text link
    We address the problem of the joint statistical inference of phylogenetic trees and multiple sequence alignments from unaligned molecular sequences. This problem is generally formulated in terms of string-valued evolutionary processes along the branches of a phylogenetic tree. The classical evolutionary process, the TKF91 model, is a continuous-time Markov chain model comprised of insertion, deletion and substitution events. Unfortunately this model gives rise to an intractable computational problem---the computation of the marginal likelihood under the TKF91 model is exponential in the number of taxa. In this work, we present a new stochastic process, the Poisson Indel Process (PIP), in which the complexity of this computation is reduced to linear. The new model is closely related to the TKF91 model, differing only in its treatment of insertions, but the new model has a global characterization as a Poisson process on the phylogeny. Standard results for Poisson processes allow key computations to be decoupled, which yields the favorable computational profile of inference under the PIP model. We present illustrative experiments in which Bayesian inference under the PIP model is compared to separate inference of phylogenies and alignments.Comment: 33 pages, 6 figure

    Sequential Monte Carlo with transformations

    Get PDF
    This paper examines methodology for performing Bayesian inference sequentially on a sequence of posteriors on spaces of different dimensions. For this, we use sequential Monte Carlo samplers, introducing the innovation of using deterministic transformations to move particles effectively between target distributions with different dimensions. This approach, combined with adaptive methods, yields an extremely flexible and general algorithm for Bayesian model comparison that is suitable for use in applications where the acceptance rate in reversible jump Markov chain Monte Carlo is low. We use this approach on model comparison for mixture models, and for inferring coalescent trees sequentially, as data arrives

    Inference of Infectious Disease Dynamics from Genetic Data via Sequential Monte Carlo

    Full text link
    When an epidemic moves through a population of hosts, the process of transmission may leave a signature in the genetic sequences of the pathogen. Patterns in pathogen sequences may therefore be a rich source of information on disease dynamics. Genetic sequences may replace or supplement other epidemiological observations. Furthermore, sequences may contain information not present in other datatypes, opening the possibility of inferences inaccessible by other means. The field of phylodynamic inference aims to reconstruct disease dynamics from pathogen genetic sequences. Although a wide variety of phylodynamic inference methods have been proposed, most methods for fitting mechanistic models of disease operate in two disjoint steps, first estimating the phylogeny of the pathogen and then fitting models of disease dynamics to properties of the estimated phylogeny. Logical inconsistency in demographic assumptions underlying the two stages of inference may create bias in resulting parameter estimates. Joint inference of disease dynamics and phylogeny ensures consistent assumptions, but few methods for joint inference are currently available. The central work of this thesis is a new method for joint inference of disease dynamics and phylogeny from pathogen genetic sequences. This likelihood-based method, which we call genPomp, allows for fitting mechanistic models of arbitrary complexity to genetic sequences. The organization of this thesis is as follows. In Chapter I, we present background on the field of phylodynamic inference. In Chapter II, we use simulation to study a two-stage inference approach proposed by Rasmussen et al. (2011). We find that errors in phylogenetic reconstruction may drive bias in two-stage phylodynamic inference. This result underscores the need for methodology for joint inference of the transmission model and the pathogen phylogeny. In Chapter III, we propose a flexible method for joint inference and demonstrate the feasibility of this method through simulation and a study on stage-specific infectiousness of HIV in Detroit, MI. This method is comprised of a class of algorithms that use sequential Monte Carlo to estimate and maximize likelihoods. In Appendix A we show theoretical support for our algorithms. In Chapter IV, we demonstrate the flexibility of our approach by developing a model of transmission of Vancomycin-resistant enterococcus in a hospital setting. To allow for fitting this model to patient-level data we developed a targeted proposal, detailed in Appendix B. We present exploratory analysis of a hospital outbreak at NIH that motivates the form of the model, and carry out a study on simulated data. Although some assumptions of the simulated example are unrealistic, these initial results will inform future efforts at fitting real data. In Chapter V, we summarize the progress represented in this thesis and consider possibilities for future work.PHDBioinformaticsUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/146063/1/alxsmth_1.pd

    Protein folding and phylogenetic tree reconstruction using stochastic approximation Monte Carlo

    Get PDF
    Recently, the stochastic approximation Monte Carlo algorithm has been proposed by Liang et al. (2005) as a general-purpose stochastic optimization and simulation algorithm. An annealing version of this algorithm was developed for real small protein folding problems. The numerical results indicate that it outperforms simulated annealing and conventional Monte Carlo algorithms as a stochastic optimization algorithm. We also propose one method for the use of secondary structures in protein folding. The predicted protein structures are rather close to the true structures. Phylogenetic trees have been used in biology for a long time to graphically represent evolutionary relationships among species and genes. An understanding of evolutionary relationships is critical to appropriate interpretation of bioinformatics results. The use of the sequential structure of phylogenetic trees in conjunction with stochastic approximation Monte Carlo was developed for phylogenetic tree reconstruction. The numerical results indicate that it has a capability of escaping from local traps and achieving a much faster convergence to the global likelihood maxima than other phylogenetic tree reconstruction methods, such as BAMBE and MrBayes
    • …
    corecore