47 research outputs found
Walks on SPR Neighborhoods
A nearest-neighbor-interchange (NNI) walk is a sequence of unrooted
phylogenetic trees, T_0, T_1, T_2,... where each consecutive pair of trees
differ by a single NNI move. We give tight bounds on the length of the shortest
NNI-walks that visit all trees in an subtree-prune-and-regraft (SPR)
neighborhood of a given tree. For any unrooted, binary tree, T, on n leaves,
the shortest walk takes {\theta}(n^2) additional steps than the number of trees
in the SPR neighborhood. This answers Bryant's Second Combinatorial Conjecture
from the Phylogenetics Challenges List, the Isaac Newton Institute, 2011, and
the Penny Ante Problem List, 2009
Shrinkage Effect in Ancestral Maximum Likelihood
Ancestral maximum likelihood (AML) is a method that simultaneously
reconstructs a phylogenetic tree and ancestral sequences from extant data
(sequences at the leaves). The tree and ancestral sequences maximize the
probability of observing the given data under a Markov model of sequence
evolution, in which branch lengths are also optimized but constrained to take
the same value on any edge across all sequence sites. AML differs from the more
usual form of maximum likelihood (ML) in phylogenetics because ML averages over
all possible ancestral sequences. ML has long been known to be statistically
consistent -- that is, it converges on the correct tree with probability
approaching 1 as the sequence length grows. However, the statistical
consistency of AML has not been formally determined, despite informal remarks
in a literature that dates back 20 years. In this short note we prove a general
result that implies that AML is statistically inconsistent. In particular we
show that AML can `shrink' short edges in a tree, resulting in a tree that has
no internal resolution as the sequence length grows. Our results apply to any
number of taxa
The Tightness of the Kesten-Stigum Reconstruction Bound of Symmetric Model with Multiple Mutations
It is well known that reconstruction problems, as the interdisciplinary
subject, have been studied in numerous contexts including statistical physics,
information theory and computational biology, to name a few. We consider a
-state symmetric model, with two categories of states in each category,
and 3 transition probabilities: the probability to remain in the same state,
the probability to change states but remain in the same category, and the
probability to change categories. We construct a nonlinear second order
dynamical system based on this model and show that the Kesten-Stigum
reconstruction bound is not tight when .Comment: Accepted, to appear Journal of Statistical Physic
Phylogenetic mixtures: Concentration of measure in the large-tree limit
The reconstruction of phylogenies from DNA or protein sequences is a major
task of computational evolutionary biology. Common phenomena, notably
variations in mutation rates across genomes and incongruences between gene
lineage histories, often make it necessary to model molecular data as
originating from a mixture of phylogenies. Such mixed models play an
increasingly important role in practice. Using concentration of measure
techniques, we show that mixtures of large trees are typically identifiable. We
also derive sequence-length requirements for high-probability reconstruction.Comment: Published in at http://dx.doi.org/10.1214/11-AAP837 the Annals of
Applied Probability (http://www.imstat.org/aap/) by the Institute of
Mathematical Statistics (http://www.imstat.org