117 research outputs found
Towards a Taxonomically Intelligent Phylogenetic Database
This note outlines some of the key intellectual obstacles that stand in the way of creating a usable phylogenetic database. These challenges include the need to accommodate multiple taxonomic names and classifications, and the need for tools to query trees in biologically meaningful ways. Until these problems are addressed, and a taxonomically intelligent phylogenetic database created, much of our phylogenetic knowledge will languish in the pages of journals
A heuristic approach for multiple restricted multiplication
Published versio
An O(n^3)-Time Algorithm for Tree Edit Distance
The {\em edit distance} between two ordered trees with vertex labels is the
minimum cost of transforming one tree into the other by a sequence of
elementary operations consisting of deleting and relabeling existing nodes, as
well as inserting new nodes. In this paper, we present a worst-case
-time algorithm for this problem, improving the previous best
-time algorithm~\cite{Klein}. Our result requires a novel
adaptive strategy for deciding how a dynamic program divides into subproblems
(which is interesting in its own right), together with a deeper understanding
of the previous algorithms for the problem. We also prove the optimality of our
algorithm among the family of \emph{decomposition strategy} algorithms--which
also includes the previous fastest algorithms--by tightening the known lower
bound of ~\cite{Touzet} to , matching our
algorithm's running time. Furthermore, we obtain matching upper and lower
bounds of when the two trees have
different sizes and~, where .Comment: 10 pages, 5 figures, 5 .tex files where TED.tex is the main on
A new balance index for phylogenetic trees
Several indices that measure the degree of balance of a rooted phylogenetic
tree have been proposed so far in the literature. In this work we define and
study a new index of this kind, which we call the total cophenetic index: the
sum, over all pairs of different leaves, of the depth of their least common
ancestor. This index makes sense for arbitrary trees, can be computed in linear
time and it has a larger range of values and a greater resolution power than
other indices like Colless' or Sackin's. We compute its maximum and minimum
values for arbitrary and binary trees, as well as exact formulas for its
expected value for binary trees under the Yule and the uniform models of
evolution. As a byproduct of this study, we obtain an exact formula for the
expected value of the Sackin index under the uniform model, a result that seems
to be new in the literature.Comment: 24 pages, 2 figures, preliminary version presented at the JBI 201
Faster Algorithms for the Maximum Common Subtree Isomorphism Problem
The maximum common subtree isomorphism problem asks for the largest possible
isomorphism between subtrees of two given input trees. This problem is a
natural restriction of the maximum common subgraph problem, which is -hard in general graphs. Confining to trees renders polynomial time
algorithms possible and is of fundamental importance for approaches on more
general graph classes. Various variants of this problem in trees have been
intensively studied. We consider the general case, where trees are neither
rooted nor ordered and the isomorphism is maximum w.r.t. a weight function on
the mapped vertices and edges. For trees of order and maximum degree
our algorithm achieves a running time of by
exploiting the structure of the matching instances arising as subproblems. Thus
our algorithm outperforms the best previously known approaches. No faster
algorithm is possible for trees of bounded degree and for trees of unbounded
degree we show that a further reduction of the running time would directly
improve the best known approach to the assignment problem. Combining a
polynomial-delay algorithm for the enumeration of all maximum common subtree
isomorphisms with central ideas of our new algorithm leads to an improvement of
its running time from to ,
where is the order of the larger tree, is the number of different
solutions, and is the minimum of the maximum degrees of the input
trees. Our theoretical results are supplemented by an experimental evaluation
on synthetic and real-world instances
- ā¦