23,121 research outputs found

    Expectation-maximization for evolutionary models

    Get PDF
    Phylogenetics plays a crucial role in enhancing our understanding of molecular sequence evolution and is regarded as one of the most effective tools for studying the origin of contagious diseases. This thesis explores the estimation of parameters of Markov processes on phylogenetic trees using the Expectation Maximization (E-M) method. Estimation of parameters on phylogenetic trees is usually performed under the assumption of time- homogeneous processes, E-M allows us to consider more general Markov processes. We implement the E-M algorithm with a view towards maximizing the likelihood of phylogenetic tree models, iteratively estimating hidden states and computing maximum likelihood estimators. To assess the performance of the E-M implementation, a DNA sequence alignment generator with a preassigned number of substitutions per edge is de- veloped, as existing software is restricted to time-reversible models and lacks optimization for nonhomogeneous data. Accuracy studies are conducted to estimate branch lengths and transition matrices parameters on a large set of simulated data, demonstrating low errors under certain sce- narios. Likelihood comparisons are made both for different initialization techniques and against existing software, such as IQ-tree, which is based on a continuous-time approach, while the E-M method opts for inferring on a Hidden Markov model. Convergence of the E-M algorithm is analyzed, and the trade-off between accuracy and execution time is explored, leading to future research directions

    A view of Estimation of Distribution Algorithms through the lens of Expectation-Maximization

    Full text link
    We show that a large class of Estimation of Distribution Algorithms, including, but not limited to, Covariance Matrix Adaption, can be written as a Monte Carlo Expectation-Maximization algorithm, and as exact EM in the limit of infinite samples. Because EM sits on a rigorous statistical foundation and has been thoroughly analyzed, this connection provides a new coherent framework with which to reason about EDAs

    EM for phylogenetic topology reconstruction on non-homogeneous data

    Get PDF
    Background: The reconstruction of the phylogenetic tree topology of four taxa is, still nowadays, one of the main challenges in phylogenetics. Its difficulties lie in considering not too restrictive evolutionary models, and correctly dealing with the long-branch attraction problem. The correct reconstruction of 4-taxon trees is crucial for making quartet-based methods work and being able to recover large phylogenies. Results: In this paper we consider an expectation-maximization method for maximizing the likelihood of (time nonhomogeneous) evolutionary Markov models on trees. We study its success on reconstructing 4-taxon topologies and its performance as input method in quartet-based phylogenetic reconstruction methods such as QFIT and QuartetSuite. Our results show that the method proposed here outperforms neighbor-joining and the usual (time-homogeneous continuous-time) maximum likelihood methods on 4-leaved trees with among-lineage instantaneous rate heterogeneity, and perform similarly to usual continuous-time maximum-likelihood when data satisfies the assumptions of both methods. Conclusions: The method presented in this paper is well suited for reconstructing the topology of any number of taxa via quartet-based methods and is highly accurate, specially regarding largely divergent trees and time nonhomogeneous data.Comment: 1 main file: 6 Figures and 2 Tables. 1 Additional file with 2 Figures and 2 Tables. To appear in "BCM Evolutionary Biology

    The EM Algorithm and the Rise of Computational Biology

    Get PDF
    In the past decade computational biology has grown from a cottage industry with a handful of researchers to an attractive interdisciplinary field, catching the attention and imagination of many quantitatively-minded scientists. Of interest to us is the key role played by the EM algorithm during this transformation. We survey the use of the EM algorithm in a few important computational biology problems surrounding the "central dogma"; of molecular biology: from DNA to RNA and then to proteins. Topics of this article include sequence motif discovery, protein sequence alignment, population genetics, evolutionary models and mRNA expression microarray data analysis.Comment: Published in at http://dx.doi.org/10.1214/09-STS312 the Statistical Science (http://www.imstat.org/sts/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Likelihood-Based Inference for Discretely Observed Birth-Death-Shift Processes, with Applications to Evolution of Mobile Genetic Elements

    Full text link
    Continuous-time birth-death-shift (BDS) processes are frequently used in stochastic modeling, with many applications in ecology and epidemiology. In particular, such processes can model evolutionary dynamics of transposable elements - important genetic markers in molecular epidemiology. Estimation of the effects of individual covariates on the birth, death, and shift rates of the process can be accomplished by analyzing patient data, but inferring these rates in a discretely and unevenly observed setting presents computational challenges. We propose a mutli-type branching process approximation to BDS processes and develop a corresponding expectation maximization (EM) algorithm, where we use spectral techniques to reduce calculation of expected sufficient statistics to low dimensional integration. These techniques yield an efficient and robust optimization routine for inferring the rates of the BDS process, and apply more broadly to multi-type branching processes where rates can depend on many covariates. After rigorously testing our methodology in simulation studies, we apply our method to study intrapatient time evolution of IS6110 transposable element, a frequently used element during estimation of epidemiological clusters of Mycobacterium tuberculosis infections.Comment: 31 pages, 7 figures, 1 tabl
    • …
    corecore