30,603 research outputs found
EM for phylogenetic topology reconstruction on non-homogeneous data
Background: The reconstruction of the phylogenetic tree topology of four taxa
is, still nowadays, one of the main challenges in phylogenetics. Its
difficulties lie in considering not too restrictive evolutionary models, and
correctly dealing with the long-branch attraction problem. The correct
reconstruction of 4-taxon trees is crucial for making quartet-based methods
work and being able to recover large phylogenies.
Results: In this paper we consider an expectation-maximization method for
maximizing the likelihood of (time nonhomogeneous) evolutionary Markov models
on trees. We study its success on reconstructing 4-taxon topologies and its
performance as input method in quartet-based phylogenetic reconstruction
methods such as QFIT and QuartetSuite. Our results show that the method
proposed here outperforms neighbor-joining and the usual (time-homogeneous
continuous-time) maximum likelihood methods on 4-leaved trees with
among-lineage instantaneous rate heterogeneity, and perform similarly to usual
continuous-time maximum-likelihood when data satisfies the assumptions of both
methods.
Conclusions: The method presented in this paper is well suited for
reconstructing the topology of any number of taxa via quartet-based methods and
is highly accurate, specially regarding largely divergent trees and time
nonhomogeneous data.Comment: 1 main file: 6 Figures and 2 Tables. 1 Additional file with 2 Figures
and 2 Tables. To appear in "BCM Evolutionary Biology
Evolutionary Inference via the Poisson Indel Process
We address the problem of the joint statistical inference of phylogenetic
trees and multiple sequence alignments from unaligned molecular sequences. This
problem is generally formulated in terms of string-valued evolutionary
processes along the branches of a phylogenetic tree. The classical evolutionary
process, the TKF91 model, is a continuous-time Markov chain model comprised of
insertion, deletion and substitution events. Unfortunately this model gives
rise to an intractable computational problem---the computation of the marginal
likelihood under the TKF91 model is exponential in the number of taxa. In this
work, we present a new stochastic process, the Poisson Indel Process (PIP), in
which the complexity of this computation is reduced to linear. The new model is
closely related to the TKF91 model, differing only in its treatment of
insertions, but the new model has a global characterization as a Poisson
process on the phylogeny. Standard results for Poisson processes allow key
computations to be decoupled, which yields the favorable computational profile
of inference under the PIP model. We present illustrative experiments in which
Bayesian inference under the PIP model is compared to separate inference of
phylogenies and alignments.Comment: 33 pages, 6 figure
The Mathematics of Phylogenomics
The grand challenges in biology today are being shaped by powerful
high-throughput technologies that have revealed the genomes of many organisms,
global expression patterns of genes and detailed information about variation
within populations. We are therefore able to ask, for the first time,
fundamental questions about the evolution of genomes, the structure of genes
and their regulation, and the connections between genotypes and phenotypes of
individuals. The answers to these questions are all predicated on progress in a
variety of computational, statistical, and mathematical fields.
The rapid growth in the characterization of genomes has led to the
advancement of a new discipline called Phylogenomics. This discipline results
from the combination of two major fields in the life sciences: Genomics, i.e.,
the study of the function and structure of genes and genomes; and Molecular
Phylogenetics, i.e., the study of the hierarchical evolutionary relationships
among organisms and their genomes. The objective of this article is to offer
mathematicians a first introduction to this emerging field, and to discuss
specific mathematical problems and developments arising from phylogenomics.Comment: 41 pages, 4 figure
Detection of recombination in DNA multiple alignments with hidden markov models
CConventional phylogenetic tree estimation methods assume that all sites in a DNA multiple alignment have the same evolutionary history. This assumption is violated in data sets from certain bacteria and viruses due to recombination, a process that leads to the creation of mosaic sequences from different strains and, if undetected, causes systematic errors in phylogenetic tree estimation. In the current work, a hidden Markov model (HMM) is employed to detect recombination events in multiple alignments of DNA sequences. The emission probabilities in a given state are determined by the branching order (topology) and the branch lengths of the respective phylogenetic tree, while the transition probabilities depend on the global recombination probability. The present study improves on an earlier heuristic parameter optimization scheme and shows how the branch lengths and the recombination probability can be optimized in a maximum likelihood sense by applying the expectation maximization (EM) algorithm. The novel algorithm is tested on a synthetic benchmark problem and is found to clearly outperform the earlier heuristic approach. The paper concludes with an application of this scheme to a DNA sequence alignment of the argF gene from four Neisseria strains, where a likely recombination event is clearly detected
- …