107 research outputs found

    Edit Distance between Unrooted Trees in Cubic Time

    Get PDF
    Edit distance between trees is a natural generalization of the classical edit distance between strings, in which the allowed elementary operations are contraction, uncontraction and relabeling of an edge. Demaine et al. [ACM Trans. on Algorithms, 6(1), 2009] showed how to compute the edit distance between rooted trees on n nodes in O(n^3) time. However, generalizing their method to unrooted trees seems quite problematic, and the most efficient known solution remains to be the previous O(n^3 log n) time algorithm by Klein [ESA 1998]. Given the lack of progress on improving this complexity, it might appear that unrooted trees are simply more difficult than rooted trees. We show that this is, in fact, not the case, and edit distance between unrooted trees on n nodes can be computed in O(n^3) time. A significantly faster solution is unlikely to exist, as Bringmann et al. [SODA 2018] proved that the complexity of computing the edit distance between rooted trees cannot be decreased to O(n^{3-epsilon}) unless some popular conjecture fails, and the lower bound easily extends to unrooted trees. We also show that for two unrooted trees of size m and n, where m <=n, our algorithm can be modified to run in O(nm^2(1+log(n/m))). This, again, matches the complexity achieved by Demaine et al. for rooted trees, who also showed that this is optimal if we restrict ourselves to the so-called decomposition algorithms

    Faster Algorithms for the Maximum Common Subtree Isomorphism Problem

    Get PDF
    The maximum common subtree isomorphism problem asks for the largest possible isomorphism between subtrees of two given input trees. This problem is a natural restriction of the maximum common subgraph problem, which is NP{\sf NP}-hard in general graphs. Confining to trees renders polynomial time algorithms possible and is of fundamental importance for approaches on more general graph classes. Various variants of this problem in trees have been intensively studied. We consider the general case, where trees are neither rooted nor ordered and the isomorphism is maximum w.r.t. a weight function on the mapped vertices and edges. For trees of order nn and maximum degree Δ\Delta our algorithm achieves a running time of O(n2Δ)\mathcal{O}(n^2\Delta) by exploiting the structure of the matching instances arising as subproblems. Thus our algorithm outperforms the best previously known approaches. No faster algorithm is possible for trees of bounded degree and for trees of unbounded degree we show that a further reduction of the running time would directly improve the best known approach to the assignment problem. Combining a polynomial-delay algorithm for the enumeration of all maximum common subtree isomorphisms with central ideas of our new algorithm leads to an improvement of its running time from O(n6+Tn2)\mathcal{O}(n^6+Tn^2) to O(n3+TnΔ)\mathcal{O}(n^3+Tn\Delta), where nn is the order of the larger tree, TT is the number of different solutions, and Δ\Delta is the minimum of the maximum degrees of the input trees. Our theoretical results are supplemented by an experimental evaluation on synthetic and real-world instances

    Subcubic Algorithm for (Unweighted) Unrooted Tree Edit Distance

    Get PDF

    Subcubic algorithm for (Unweighted) Unrooted Tree Edit Distance

    Full text link
    The tree edit distance problem is a natural generalization of the classic string edit distance problem. Given two ordered, edge-labeled trees T1T_1 and T2T_2, the edit distance between T1T_1 and T2T_2 is defined as the minimum total cost of operations that transform T1T_1 into T2T_2. In one operation, we can contract an edge, split a vertex into two or change the label of an edge. For the weighted version of the problem, where the cost of each operation depends on the type of the operation and the label on the edge involved, O(n3)\mathcal{O}(n^3) time algorithms are known for both rooted and unrooted trees. The existence of a truly subcubic O(n3−ϵ)\mathcal{O}(n^{3-\epsilon}) time algorithm is unlikely, as it would imply a truly subcubic algorithm for the APSP problem. However, recently Mao (FOCS'21) showed that if we assume that each operation has a unit cost, then the tree edit distance between two rooted trees can be computed in truly subcubic time. In this paper, we show how to adapt Mao's algorithm to make it work for unrooted trees and we show an O~(n(7ω+15)/(2ω+6))≤O(n2.9417)\widetilde{\mathcal{O}}(n^{(7\omega + 15)/(2\omega + 6)}) \leq \mathcal{O}(n^{2.9417}) time algorithm for the unweighted tree edit distance between two unrooted trees, where ω≤2.373\omega \leq 2.373 is the matrix multiplication exponent. It is the first known subcubic algorithm for unrooted trees. The main idea behind our algorithm is the fact that to compute the tree edit distance between two unrooted trees, it is enough to compute the tree edit distance between an arbitrary rooting of the first tree and every rooting of the second tree.Comment: 20 page

    Algorithms for efficient phylogenetic tree construction

    Get PDF
    The rapidly increasing amount of available genomic sequence data provides an abundance of potential information for phylogenetic analyses. Many models and methods have been developed to build evolutionary trees based on this information. A common feature of most of these models is that they start out with fragments of the genome, called genes. Depending on the genes and species, and the methods used to perform the phylogenetic analyses, one typically ends up with a large number of phylogenetic trees which may not agree with one another. Simply put, the problem now is the following: Given several discordant phylogenetic trees as input, infer the (presumably) correct phylogeny. This thesis seeks to address some of the methodological and algorithmic challenges posed by this problem. In particular, we present two new algorithms related to inferring phylogenetic trees in the presence of gene duplication, and introduce a new distance measure for comparing phylogenetic trees

    How to compare arc-annotated sequences: The alignment hierarchy

    No full text
    International audienceWe describe a new unifying framework to express comparison of arc-annotated sequences, which we call alignment of arc-annotated sequences. We first prove that this framework encompasses main existing models, which allows us to deduce complexity results for several cases from the literature. We also show that this framework gives rise to new relevant problems that have not been studied yet. We provide a thorough analysis of these novel cases by proposing two polynomial time algorithms and an NP-completeness proof. This leads to an almost exhaustive study of alignment of arc-annotated sequences

    How to compare arc-annotated sequences: The alignment hierarchy

    Get PDF
    International audienceWe describe a new unifying framework to express comparison of arc-annotated sequences, which we call alignment of arc-annotated sequences. We first prove that this framework encompasses main existing models, which allows us to deduce complexity results for several cases from the literature. We also show that this framework gives rise to new relevant problems that have not been studied yet. We provide a thorough analysis of these novel cases by proposing two polynomial time algorithms and an NP-completeness proof. This leads to an almost exhaustive study of alignment of arc-annotated sequences

    Decomposition algorithms for the tree edit distance problem

    Get PDF
    AbstractWe study the behavior of dynamic programming methods for the tree edit distance problem, such as [P. Klein, Computing the edit-distance between unrooted ordered trees, in: Proceedings of 6th European Symposium on Algorithms, 1998, p. 91–102; K. Zhang, D. Shasha, SIAM J. Comput. 18 (6) (1989) 1245–1262]. We show that those two algorithms may be described as decomposition strategies. We introduce the general framework of cover strategies, and we provide an exact characterization of the complexity of cover strategies. This analysis allows us to define a new tree edit distance algorithm, that is optimal for cover strategies
    • …