Search CORE

138 research outputs found

Cache Oblivious Algorithms for Computing the Triplet Distance Between Trees

Author: Mampentzidis Konstantinos
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 25th Annual European Symposium on Algorithms (ESA 2017)
Publication date: 01/01/2017
Field of study

We study the problem of computing the triplet distance between two rooted unordered trees with n labeled leafs. Introduced by Dobson 1975, the triplet distance is the number of leaf triples that induce different topologies in the two trees. The current theoretically best algorithm is an O(nlogn) time algorithm by Brodal et al. [SODA 2013]. Recently Jansson et al. proposed a new algorithm that, while slower in theory, requiring O(n log^3 n) time, in practice it outperforms the theoretically faster O(n log n) algorithm. Both algorithms do not scale to external memory. We present two cache oblivious algorithms that combine the best of both worlds. The first algorithm is for the case when the two input trees are binary trees and the second a generalized algorithm for two input trees of arbitrary degree. Analyzed in the RAM model, both algorithms require O(n log n) time, and in the cache oblivious model O(n/B log_{2}(n/M)) I/Os. Their relative simplicity and the fact that they scale to external memory makes them achieve the best practical performance. We note that these are the first algorithms that scale to external memory, both in theory and practice, for this problem

Dagstuhl Research Online Publication Server

Optimal Subtree Prune and Regraft for Quartet Score in Sub-Quadratic Time

Author: Arasti Shayesteh
Mirarab Siavash
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 23rd International Workshop on Algorithms in Bioinformatics (WABI 2023)
Publication date: 01/01/2023
Field of study

Finding a tree with the minimum total distance to a given set of trees (the median tree) is increasingly needed in phylogenetics. Defining tree distance as the number of induced four-taxon unrooted (i.e., quartet) trees with different topologies, the median of a set of gene trees is a statistically consistent estimator of the species tree under several models of gene tree species tree discordance. Because of this, median trees defined with quartet distance are widely used in practice for species tree inference. Nevertheless, the problem is NP-Hard and the widely-used solutions are heuristics. In this paper, we pave the way for a new type of heuristic solution to this problem. We show that the optimal place to add a subtree of size m onto a tree with n leaves can be found in time that grows quasi-linearly with n and is nearly independent of m. This algorithm can be used to perform subtree prune and regraft (SPR) moves efficiently, which in turn enables the hill-climbing heuristic search for the optimal tree. In exploratory experiments, we show that our algorithm can improve the quartet score of trees obtained using the existing widely-used methods

Dagstuhl Research Online Publication Server

On Two Measures of Distance Between Fully-Labelled Trees

Author: Bernardini Giulia
Bonizzoni Paola
Gawrychowski Pawe?
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 31st Annual Symposium on Combinatorial Pattern Matching (CPM 2020)
Publication date: 01/01/2020
Field of study

The last decade brought a significant increase in the amount of data and a variety of new inference methods for reconstructing the detailed evolutionary history of various cancers. This brings the need of designing efficient procedures for comparing rooted trees representing the evolution of mutations in tumor phylogenies. Bernardini et al. [CPM 2019] recently introduced a notion of the rearrangement distance for fully-labelled trees motivated by this necessity. This notion originates from two operations: one that permutes the labels of the nodes, the other that affects the topology of the tree. Each operation alone defines a distance that can be computed in polynomial time, while the actual rearrangement distance, that combines the two, was proven to be NP-hard. We answer two open question left unanswered by the previous work. First, what is the complexity of computing the permutation distance? Second, is there a constant-factor approximation algorithm for estimating the rearrangement distance between two arbitrary trees? We answer the first one by showing, via a two-way reduction, that calculating the permutation distance between two trees on n nodes is equivalent, up to polylogarithmic factors, to finding the largest cardinality matching in a sparse bipartite graph. In particular, by plugging in the algorithm of Liu and Sidford [ArXiv 2020], we obtain an ??(n^{4/3+o(1}) time algorithm for computing the permutation distance between two trees on n nodes. Then we answer the second question positively, and design a linear-time constant-factor approximation algorithm that does not need any assumption on the trees

arXiv.org e-Print Archive

Archivio istituzionale della ricerca - Università di Trieste

Dagstuhl Research Online Publication Server

A practical ( log ) time algorithm for computing the triplet distance on binary trees

Author
Publication venue: BioMed Central
Publication date
Field of study

Springer - Publisher Connector

A practical O(n log2 n) time algorithm for computing the triplet distance on binary trees

Author: Brodal Gerth Stølting
Fagerberg Rolf
Mailund Thomas
Pedersen Christian N. S.
Sand Andreas
Publication venue
Publication date: 01/01/2013
Field of study

Crossref

University of Southern Denmark Research Output

Tracing evolutionary links between species

Author: Steel Mike
Publication venue
Publication date: 01/01/2014
Field of study

The idea that all life on earth traces back to a common beginning dates back at least to Charles Darwin's {\em Origin of Species}. Ever since, biologists have tried to piece together parts of this `tree of life' based on what we can observe today: fossils, and the evolutionary signal that is present in the genomes and phenotypes of different organisms. Mathematics has played a key role in helping transform genetic data into phylogenetic (evolutionary) trees and networks. Here, I will explain some of the central concepts and basic results in phylogenetics, which benefit from several branches of mathematics, including combinatorics, probability and algebra.Comment: 18 pages, 6 figures (Invited review paper (draft version) for AMM

arXiv.org e-Print Archive

CiteSeerX