23,121 research outputs found
Expectation-maximization for evolutionary models
Phylogenetics plays a crucial role in enhancing our understanding of molecular sequence evolution and is regarded as one of the most effective tools for studying the origin of contagious diseases. This thesis explores the estimation of parameters of Markov processes on phylogenetic trees using the Expectation Maximization (E-M) method. Estimation of parameters on phylogenetic trees is usually performed under the assumption of time- homogeneous processes, E-M allows us to consider more general Markov processes. We implement the E-M algorithm with a view towards maximizing the likelihood of phylogenetic tree models, iteratively estimating hidden states and computing maximum likelihood estimators. To assess the performance of the E-M implementation, a DNA sequence alignment generator with a preassigned number of substitutions per edge is de- veloped, as existing software is restricted to time-reversible models and lacks optimization for nonhomogeneous data. Accuracy studies are conducted to estimate branch lengths and transition matrices parameters on a large set of simulated data, demonstrating low errors under certain sce- narios. Likelihood comparisons are made both for different initialization techniques and against existing software, such as IQ-tree, which is based on a continuous-time approach, while the E-M method opts for inferring on a Hidden Markov model. Convergence of the E-M algorithm is analyzed, and the trade-off between accuracy and execution time is explored, leading to future research directions
A view of Estimation of Distribution Algorithms through the lens of Expectation-Maximization
We show that a large class of Estimation of Distribution Algorithms,
including, but not limited to, Covariance Matrix Adaption, can be written as a
Monte Carlo Expectation-Maximization algorithm, and as exact EM in the limit of
infinite samples. Because EM sits on a rigorous statistical foundation and has
been thoroughly analyzed, this connection provides a new coherent framework
with which to reason about EDAs
EM for phylogenetic topology reconstruction on non-homogeneous data
Background: The reconstruction of the phylogenetic tree topology of four taxa
is, still nowadays, one of the main challenges in phylogenetics. Its
difficulties lie in considering not too restrictive evolutionary models, and
correctly dealing with the long-branch attraction problem. The correct
reconstruction of 4-taxon trees is crucial for making quartet-based methods
work and being able to recover large phylogenies.
Results: In this paper we consider an expectation-maximization method for
maximizing the likelihood of (time nonhomogeneous) evolutionary Markov models
on trees. We study its success on reconstructing 4-taxon topologies and its
performance as input method in quartet-based phylogenetic reconstruction
methods such as QFIT and QuartetSuite. Our results show that the method
proposed here outperforms neighbor-joining and the usual (time-homogeneous
continuous-time) maximum likelihood methods on 4-leaved trees with
among-lineage instantaneous rate heterogeneity, and perform similarly to usual
continuous-time maximum-likelihood when data satisfies the assumptions of both
methods.
Conclusions: The method presented in this paper is well suited for
reconstructing the topology of any number of taxa via quartet-based methods and
is highly accurate, specially regarding largely divergent trees and time
nonhomogeneous data.Comment: 1 main file: 6 Figures and 2 Tables. 1 Additional file with 2 Figures
and 2 Tables. To appear in "BCM Evolutionary Biology
The EM Algorithm and the Rise of Computational Biology
In the past decade computational biology has grown from a cottage industry
with a handful of researchers to an attractive interdisciplinary field,
catching the attention and imagination of many quantitatively-minded
scientists. Of interest to us is the key role played by the EM algorithm during
this transformation. We survey the use of the EM algorithm in a few important
computational biology problems surrounding the "central dogma"; of molecular
biology: from DNA to RNA and then to proteins. Topics of this article include
sequence motif discovery, protein sequence alignment, population genetics,
evolutionary models and mRNA expression microarray data analysis.Comment: Published in at http://dx.doi.org/10.1214/09-STS312 the Statistical
Science (http://www.imstat.org/sts/) by the Institute of Mathematical
Statistics (http://www.imstat.org
Likelihood-Based Inference for Discretely Observed Birth-Death-Shift Processes, with Applications to Evolution of Mobile Genetic Elements
Continuous-time birth-death-shift (BDS) processes are frequently used in
stochastic modeling, with many applications in ecology and epidemiology. In
particular, such processes can model evolutionary dynamics of transposable
elements - important genetic markers in molecular epidemiology. Estimation of
the effects of individual covariates on the birth, death, and shift rates of
the process can be accomplished by analyzing patient data, but inferring these
rates in a discretely and unevenly observed setting presents computational
challenges. We propose a mutli-type branching process approximation to BDS
processes and develop a corresponding expectation maximization (EM) algorithm,
where we use spectral techniques to reduce calculation of expected sufficient
statistics to low dimensional integration. These techniques yield an efficient
and robust optimization routine for inferring the rates of the BDS process, and
apply more broadly to multi-type branching processes where rates can depend on
many covariates. After rigorously testing our methodology in simulation
studies, we apply our method to study intrapatient time evolution of IS6110
transposable element, a frequently used element during estimation of
epidemiological clusters of Mycobacterium tuberculosis infections.Comment: 31 pages, 7 figures, 1 tabl
- …