Search CORE

3,049 research outputs found

Tropical Geometry of Phylogenetic Tree Space: A Statistical Perspective

Author: Kang Qiwen
Lin Bo
Monod Anthea
Yoshida Ruriko
Publication venue
Publication date: 13/08/2020
Field of study

Phylogenetic trees are the fundamental mathematical representation of evolutionary processes in biology. As data objects, they are characterized by the challenges associated with "big data," as well as the complication that their discrete geometric structure results in a non-Euclidean phylogenetic tree space, which poses computational and statistical limitations. We propose and study a novel framework to study sets of phylogenetic trees based on tropical geometry. In particular, we focus on characterizing our framework for statistical analyses of evolutionary biological processes represented by phylogenetic trees. Our setting exhibits analytic, geometric, and topological properties that are desirable for theoretical studies in probability and statistics, as well as increased computational efficiency over the current state-of-the-art. We demonstrate our approach on seasonal influenza data.Comment: 28 pages, 5 figures, 1 tabl

arXiv.org e-Print Archive

Computing the Distribution of a Tree Metric

Author: Bryant David
Steel Mike
Publication venue
Publication date: 05/10/2008
Field of study

The Robinson-Foulds (RF) distance is by far the most widely used measure of dissimilarity between trees. Although the distribution of these distances has been investigated for twenty years, an algorithm that is explicitly polynomial time has yet to be described for computing this distribution (which is also the distribution of trees around a given tree under the popular Robinson-Foulds metric). In this paper we derive a polynomial-time algorithm for this distribution. We show how the distribution can be approximated by a Poisson distribution determined by the proportion of leaves that lie in `cherries' of the given tree. We also describe how our results can be used to derive normalization constants that are required in a recently-proposed maximum likelihood approach to supertree construction.Comment: 16 pages, 3 figure

arXiv.org e-Print Archive

CiteSeerX

UC Research Repository

On the inference of large phylogenies with long branches: How long is too long?

Author: Mossel Elchanan
Roch Sebastien
Sly Allan
Publication venue
Publication date: 01/01/2010
Field of study

Recent work has highlighted deep connections between sequence-length requirements for high-probability phylogeny reconstruction and the related problem of the estimation of ancestral sequences. In [Daskalakis et al.'09], building on the work of [Mossel'04], a tight sequence-length requirement was obtained for the CFN model. In particular the required sequence length for high-probability reconstruction was shown to undergo a sharp transition (from

O(\log n)

\hbox{poly}(n)

, where

n

is the number of leaves) at the "critical" branch length \critmlq (if it exists) of the ancestral reconstruction problem. Here we consider the GTR model. For this model, recent results of [Roch'09] show that the tree can be accurately reconstructed with sequences of length

O(\log(n))

when the branch lengths are below \critksq, known as the Kesten-Stigum (KS) bound. Although for the CFN model \critmlq = \critksq, it is known that for the more general GTR models one has \critmlq \geq \critksq with a strict inequality in many cases. Here, we show that this phenomenon also holds for phylogenetic reconstruction by exhibiting a family of symmetric models

Q

and a phylogenetic reconstruction algorithm which recovers the tree from

O(\log n)

-length sequences for some branch lengths in the range (\critksq,\critmlq). Second we prove that phylogenetic reconstruction under GTR models requires a polynomial sequence-length for branch lengths above \critmlq

arXiv.org e-Print Archive

Springer - Publisher Connector

ScholarlyCommons@Penn

Tracing evolutionary links between species

Author: Steel Mike
Publication venue
Publication date: 01/01/2014
Field of study

The idea that all life on earth traces back to a common beginning dates back at least to Charles Darwin's {\em Origin of Species}. Ever since, biologists have tried to piece together parts of this `tree of life' based on what we can observe today: fossils, and the evolutionary signal that is present in the genomes and phenotypes of different organisms. Mathematics has played a key role in helping transform genetic data into phylogenetic (evolutionary) trees and networks. Here, I will explain some of the central concepts and basic results in phylogenetics, which benefit from several branches of mathematics, including combinatorics, probability and algebra.Comment: 18 pages, 6 figures (Invited review paper (draft version) for AMM

arXiv.org e-Print Archive

CiteSeerX

Learning Latent Tree Graphical Models

Author: Anandkumar Animashree
Choi Myung Jin
Tan Vincent Y. F.
Willsky Alan S.
Publication venue
Publication date: 14/09/2010
Field of study

We study the problem of learning a latent tree graphical model where samples are available only from a subset of variables. We propose two consistent and computationally efficient algorithms for learning minimal latent trees, that is, trees without any redundant hidden nodes. Unlike many existing methods, the observed nodes (or variables) are not constrained to be leaf nodes. Our first algorithm, recursive grouping, builds the latent tree recursively by identifying sibling groups using so-called information distances. One of the main contributions of this work is our second algorithm, which we refer to as CLGrouping. CLGrouping starts with a pre-processing procedure in which a tree over the observed variables is constructed. This global step groups the observed nodes that are likely to be close to each other in the true latent tree, thereby guiding subsequent recursive grouping (or equivalent procedures) on much smaller subsets of variables. This results in more accurate and efficient learning of latent trees. We also present regularized versions of our algorithms that learn latent tree approximations of arbitrary distributions. We compare the proposed algorithms to other methods by performing extensive numerical experiments on various latent tree graphical models such as hidden Markov models and star graphs. In addition, we demonstrate the applicability of our methods on real-world datasets by modeling the dependency structure of monthly stock returns in the S&P index and of the words in the 20 newsgroups dataset

arXiv.org e-Print Archive

CiteSeerX

DSpace@MIT

Caltech Authors