3,704 research outputs found
An Even Faster and More Unifying Algorithm for Comparing Trees via Unbalanced Bipartite Matchings
A widely used method for determining the similarity of two labeled trees is
to compute a maximum agreement subtree of the two trees. Previous work on this
similarity measure is only concerned with the comparison of labeled trees of
two special kinds, namely, uniformly labeled trees (i.e., trees with all their
nodes labeled by the same symbol) and evolutionary trees (i.e., leaf-labeled
trees with distinct symbols for distinct leaves). This paper presents an
algorithm for comparing trees that are labeled in an arbitrary manner. In
addition to this generality, this algorithm is faster than the previous
algorithms.
Another contribution of this paper is on maximum weight bipartite matchings.
We show how to speed up the best known matching algorithms when the input
graphs are node-unbalanced or weight-unbalanced. Based on these enhancements,
we obtain an efficient algorithm for a new matching problem called the
hierarchical bipartite matching problem, which is at the core of our maximum
agreement subtree algorithm.Comment: To appear in Journal of Algorithm
A Fast Quartet Tree Heuristic for Hierarchical Clustering
The Minimum Quartet Tree Cost problem is to construct an optimal weight tree
from the weighted quartet topologies on objects, where
optimality means that the summed weight of the embedded quartet topologies is
optimal (so it can be the case that the optimal tree embeds all quartets as
nonoptimal topologies). We present a Monte Carlo heuristic, based on randomized
hill climbing, for approximating the optimal weight tree, given the quartet
topology weights. The method repeatedly transforms a dendrogram, with all
objects involved as leaves, achieving a monotonic approximation to the exact
single globally optimal tree. The problem and the solution heuristic has been
extensively used for general hierarchical clustering of nontree-like
(non-phylogeny) data in various domains and across domains with heterogeneous
data. We also present a greatly improved heuristic, reducing the running time
by a factor of order a thousand to ten thousand. All this is implemented and
available, as part of the CompLearn package. We compare performance and running
time of the original and improved versions with those of UPGMA, BioNJ, and NJ,
as implemented in the SplitsTree package on genomic data for which the latter
are optimized.
Keywords: Data and knowledge visualization, Pattern
matching--Clustering--Algorithms/Similarity measures, Hierarchical clustering,
Global optimization, Quartet tree, Randomized hill-climbing,Comment: LaTeX, 40 pages, 11 figures; this paper has substantial overlap with
arXiv:cs/0606048 in cs.D
Cavity Matchings, Label Compressions, and Unrooted Evolutionary Trees
We present an algorithm for computing a maximum agreement subtree of two
unrooted evolutionary trees. It takes O(n^{1.5} log n) time for trees with
unbounded degrees, matching the best known time complexity for the rooted case.
Our algorithm allows the input trees to be mixed trees, i.e., trees that may
contain directed and undirected edges at the same time. Our algorithm adopts a
recursive strategy exploiting a technique called label compression. The
backbone of this technique is an algorithm that computes the maximum weight
matchings over many subgraphs of a bipartite graph as fast as it takes to
compute a single matching
A Duality Based 2-Approximation Algorithm for Maximum Agreement Forest
We give a 2-approximation algorithm for the Maximum Agreement Forest problem
on two rooted binary trees. This NP-hard problem has been studied extensively
in the past two decades, since it can be used to compute the Subtree
Prune-and-Regraft (SPR) distance between two phylogenetic trees. Our result
improves on the very recent 2.5-approximation algorithm due to Shi, Feng, You
and Wang (2015). Our algorithm is the first approximation algorithm for this
problem that uses LP duality in its analysis
Reconciling taxonomy and phylogenetic inference: formalism and algorithms for describing discord and inferring taxonomic roots
Although taxonomy is often used informally to evaluate the results of
phylogenetic inference and find the root of phylogenetic trees, algorithmic
methods to do so are lacking. In this paper we formalize these procedures and
develop algorithms to solve the relevant problems. In particular, we introduce
a new algorithm that solves a "subcoloring" problem for expressing the
difference between the taxonomy and phylogeny at a given rank. This algorithm
improves upon the current best algorithm in terms of asymptotic complexity for
the parameter regime of interest; we also describe a branch-and-bound algorithm
that saves orders of magnitude in computation on real data sets. We also
develop a formalism and an algorithm for rooting phylogenetic trees according
to a taxonomy. All of these algorithms are implemented in freely-available
software.Comment: Version submitted to Algorithms for Molecular Biology. A number of
fixes from previous versio
Improved Bounds for the Excluded-Minor Approximation of Treedepth
Treedepth, a more restrictive graph width parameter than treewidth and pathwidth, plays a major role in the theory of sparse graph classes. We show that there exists a constant C such that for every integers a,b >= 2 and a graph G, if the treedepth of G is at least Cab log a, then the treewidth of G is at least a or G contains a subcubic (i.e., of maximum degree at most 3) tree of treedepth at least b as a subgraph.
As a direct corollary, we obtain that every graph of treedepth Omega(k^3 log k) is either of treewidth at least k, contains a subdivision of full binary tree of depth k, or contains a path of length 2^k. This improves the bound of Omega(k^5 log^2 k) of Kawarabayashi and Rossman [SODA 2018].
We also show an application for approximation algorithms of treedepth: given a graph G of treedepth k and treewidth t, one can in polynomial time compute a treedepth decomposition of G of width O(kt log^{3/2} t). This improves upon a bound of O(kt^2 log t) stemming from a tradeoff between known results.
The main technical ingredient in our result is a proof that every tree of treedepth d contains a subcubic subtree of treedepth at least d * log_3 ((1+sqrt{5})/2)
- …