Search CORE

21 research outputs found

The generalized Robinson-Foulds metric

Author: B.L. Allen
C. Finden
D. Bogdanowicz
D.E. Critchlow
D.F. Robinson
K. Dabrowski
L.A. Lewis
M. Deza
M.-Y. Kao
M.S. Bansal
O. Dubois
R.G. Downey
S.-J. Sul
T. Griebel
T. Munzner
T.M.W. Nye
Y. Lin
Publication venue
Publication date: 01/01/2013
Field of study

The Robinson-Foulds (RF) metric is arguably the most widely used measure of phylogenetic tree similarity, despite its well-known shortcomings: For example, moving a single taxon in a tree can result in a tree that has maximum distance to the original one; but the two trees are identical if we remove the single taxon. To this end, we propose a natural extension of the RF metric that does not simply count identical clades but instead, also takes similar clades into consideration. In contrast to previous approaches, our model requires the matching between clades to respect the structure of the two trees, a property that the classical RF metric exhibits, too. We show that computing this generalized RF metric is, unfortunately, NP-hard. We then present a simple Integer Linear Program for its computation, and evaluate it by an all-against-all comparison of 100 trees from a benchmark data set. We find that matchings that respect the tree structure differ significantly from those that do not, underlining the importance of this natural condition.Comment: Peer-reviewed and presented as part of the 13th Workshop on Algorithms in Bioinformatics (WABI2013

arXiv.org e-Print Archive

CiteSeerX

Crossref

VU Research Portal

CWI's Institutional Repository

A parsimony-based metric for phylogenetic trees

Author: Alberich
Allen
Bogdanowicz
Bonet
Bordewich
Bruen
Bryant
Caceres
Day
Ding
Erdös
Hickey
Humphries
Kubatko
Lin
Robinson
Semple
Steel
Taoyang Wu
Vincent Moulton
Waterman
Whelan
Publication venue: 'Elsevier BV'
Publication date: 06/03/2015
Field of study

In evolutionary biology various metrics have been defined and studied for comparing phylogenetic trees. Such metrics are used, for example, to compare competing evolutionary hypotheses or to help organize algorithms that search for optimal trees. Here we introduce a new metric dpdp on the collection of binary phylogenetic trees each labeled by the same set of species. The metric is based on the so-called parsimony score, an important concept in phylogenetics that is commonly used to construct phylogenetic trees. Our main results include a characterization of the unit neighborhood of a tree in the dpdp metric, and an explicit formula for its diameter, that is, a formula for the maximum possible value of dpdp over all possible pairs of trees labeled by the same set of species. We also show that dpdp is closely related to the well-known tree bisection and reconnection (tbr) and subtree prune and regraft (spr) distances, a connection which will hopefully provide a useful new approach to understanding properties of these and related metrics

Crossref

University of East Anglia digital repository

Comparing and simplifying distinct-cluster phylogenetic networks

Author: Willson Stephen J.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 06/08/2016
Field of study

Phylogenetic networks are rooted acyclic directed graphs in which the leaves are identified with members of a set X of species. The cluster of a vertex is the set of leaves that are descendants of the vertex. A network is "distinct-cluster" if distinct vertices have distinct clusters. This paper focuses on the set DC(X) of distinct-cluster networks whose leaves are identified with the members of X. For a fixed X, a metric on DC(X) is defined. There is a "cluster-preserving" simplification process by which vertices or certain arcs may be removed without changing the clusters of any remaining vertices. Many of the resulting networks may be uniquely determined without regard to the order of the simplifying operations.Comment: This is version 2. A previous version is already on ArXi

arXiv.org e-Print Archive

Springer - Publisher Connector

GLProbs: Aligning multiple sequences adaptively

Author: Cheung DWL
Lam TW
Ting HF
Wang YD
YE Y
Yiu SM
Zhan Q
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2015
Field of study

published_or_final_versio

HKU Scholars Hub

The generalized Robinson-Foulds distance for phylogenetic trees

Author: Llabrés Segura Mercè
Rosselló Llompart Francesc A.
Valiente Feruglio Gabriel Alejandro
Publication venue: 'Mary Ann Liebert Inc'
Publication date: 01/12/2021
Field of study

The Robinson-Foulds (RF) distance, one of the most widely used metrics for comparing phylogenetic trees, has the advantage of being intuitive, with a natural interpretation in terms of common splits, and it can be computed in linear time, but it has a very low resolution, and it may become trivial for phylogenetic trees with overlapping taxa, that is, phylogenetic trees that share some but not all of their leaf labels. In this article, we study the properties of the Generalized Robinson-Foulds (GRF) distance, a recently proposed metric for comparing any structures that can be described by multisets of multisets of labels, when applied to rooted phylogenetic trees with overlapping taxa, which are described by sets of clusters, that is, by sets of sets of labels. We show that the GRF distance has a very high resolution, it can also be computed in linear time, and it is not (uniformly) equivalent to the RF distance.This research was partially supported by the Spanish Ministry of Science, Innovation and Universitiesand the European Regional Development Fund through project PGC2018-096956-B-C43 (FEDER/MICINN/AEI), and by the Agency for Management of University and Research Grants (AGAUR) throughgrant 2017-SGR-786 (ALBCOM).Peer ReviewedPostprint (published version

UPCommons. Portal del coneixement obert de la UPC

Bayesian Model-building in Phylogenetics

Author: Nelson Bradley
Publication venue: LSU Digital Commons
Publication date: 01/01/2014
Field of study

DNA sequencing costs have decreased dramatically over recent decades, resulting in a flood of phylogenetic information available to researchers. While it is often assumed that additional data will lead to more accurate conclusions, it also raises a number of problems for phylogeneticists, including mundane computational issues such as data management and complex statistical problems such as obtaining a single species tree from multiple conflicting gene trees. Developing new methods to make better use of existing data and probe the causes of conflicting signal will be necessary to confidently resolve phylogenies in the genomic era. Here, we examine two current problems in statistical phylogenetics and attempt to address them in a Bayesian framework. The first problem involves inflated tree lengths in Bayesian phylogenies, which can be an order of magnitude longer than maximum likelihood estimates. We developed EmpPrior, a program which queries TreeBASE for datasets similar to the focal data, then estimates parameters from each dataset to inform priors on the focal data. This approach greatly improves the tree length credible intervals in four exemplar datasets and, when combined with other approaches such as the use of a compound Dirichlet prior on tree length, can nearly eliminate the problem of inflated trees. The second problem involves incongruence between morphological and molecular phylogenies in squamates. Here, we use posterior prediction with inferential test statistics to investigate whether systematic error may be biasing inference in the molecular data. While we detected some model violation in most of the 44 genes, the genes with the most model violation were more distant from the molecular phylogeny. This suggests that model violation is not a major source of error in the molecular data. Hence, the source of incongruence between the molecular and morphological squamate topologies remains unknown. In both problems, we found that incorporating tools such as informed priors and posterior prediction from Bayesian statistical literature into phylogenetic analyses can improve results and help uncover why different datasets lead to conflicting topologies. As phylogenetic datasets continue to grow, using methodological best practices will only become more important if we want to have confidence in our conclusions

Louisiana State University

Can we identify genes with increased phylogenetic reliability?

Author: Brown J M
Doyle V P
Doyle V P
Naylor G J
Young R E
Publication venue: LSU Digital Commons
Publication date: 01/01/2015
Field of study

© The Author(s) 2015. Topological heterogeneity among gene trees is widely observed in phylogenomic analyses and some of this variation is likely caused by systematic error in gene tree estimation. Systematic error can be mitigated by improving models of sequence evolution to account for all evolutionary processes relevant to each gene or identifying those genes whose evolution best conforms to existing models. However, the best method for identifying such genes is not well established. Here, we ask if filtering genes according to their clock-likeness or posterior predictive effect size (PPES, an inference-based measure of model violation) improves phylogenetic reliability and congruence. We compared these approaches to each other, and to the common practice of filtering based on rate of evolution, using two different metrics. First, we compared gene-tree topologies to accepted reference topologies. Second, we examined topological similarity among gene trees in filtered sets. Our results suggest that filtering genes based on clock-likeness and PPES can yield a collection of genes with more reliable phylogenetic signal. For the two exemplar data sets we explored, from yeast and amniotes, clock-likeness and PPES outperformed rate-based filtering in both congruence and reliability

Louisiana State University

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY