47 research outputs found

    Walks on SPR Neighborhoods

    Full text link
    A nearest-neighbor-interchange (NNI) walk is a sequence of unrooted phylogenetic trees, T_0, T_1, T_2,... where each consecutive pair of trees differ by a single NNI move. We give tight bounds on the length of the shortest NNI-walks that visit all trees in an subtree-prune-and-regraft (SPR) neighborhood of a given tree. For any unrooted, binary tree, T, on n leaves, the shortest walk takes {\theta}(n^2) additional steps than the number of trees in the SPR neighborhood. This answers Bryant's Second Combinatorial Conjecture from the Phylogenetics Challenges List, the Isaac Newton Institute, 2011, and the Penny Ante Problem List, 2009

    Shrinkage Effect in Ancestral Maximum Likelihood

    Get PDF
    Ancestral maximum likelihood (AML) is a method that simultaneously reconstructs a phylogenetic tree and ancestral sequences from extant data (sequences at the leaves). The tree and ancestral sequences maximize the probability of observing the given data under a Markov model of sequence evolution, in which branch lengths are also optimized but constrained to take the same value on any edge across all sequence sites. AML differs from the more usual form of maximum likelihood (ML) in phylogenetics because ML averages over all possible ancestral sequences. ML has long been known to be statistically consistent -- that is, it converges on the correct tree with probability approaching 1 as the sequence length grows. However, the statistical consistency of AML has not been formally determined, despite informal remarks in a literature that dates back 20 years. In this short note we prove a general result that implies that AML is statistically inconsistent. In particular we show that AML can `shrink' short edges in a tree, resulting in a tree that has no internal resolution as the sequence length grows. Our results apply to any number of taxa

    The Tightness of the Kesten-Stigum Reconstruction Bound of Symmetric Model with Multiple Mutations

    Full text link
    It is well known that reconstruction problems, as the interdisciplinary subject, have been studied in numerous contexts including statistical physics, information theory and computational biology, to name a few. We consider a 2q2q-state symmetric model, with two categories of qq states in each category, and 3 transition probabilities: the probability to remain in the same state, the probability to change states but remain in the same category, and the probability to change categories. We construct a nonlinear second order dynamical system based on this model and show that the Kesten-Stigum reconstruction bound is not tight when q4q \geq 4.Comment: Accepted, to appear Journal of Statistical Physic

    Phylogenetic mixtures: Concentration of measure in the large-tree limit

    Get PDF
    The reconstruction of phylogenies from DNA or protein sequences is a major task of computational evolutionary biology. Common phenomena, notably variations in mutation rates across genomes and incongruences between gene lineage histories, often make it necessary to model molecular data as originating from a mixture of phylogenies. Such mixed models play an increasingly important role in practice. Using concentration of measure techniques, we show that mixtures of large trees are typically identifiable. We also derive sequence-length requirements for high-probability reconstruction.Comment: Published in at http://dx.doi.org/10.1214/11-AAP837 the Annals of Applied Probability (http://www.imstat.org/aap/) by the Institute of Mathematical Statistics (http://www.imstat.org
    corecore