57 research outputs found
New and Improved Algorithms for Unordered Tree Inclusion
The tree inclusion problem is, given two node-labeled trees P and T (the "pattern tree" and the "text tree"), to locate every minimal subtree in T (if any) that can be obtained by applying a sequence of node insertion operations to P. Although the ordered tree inclusion problem is solvable in polynomial time, the unordered tree inclusion problem is NP-hard. The currently fastest algorithm for the latter is from 1995 and runs in O(poly(m,n) * 2^{2d}) = O^*(2^{2d}) time, where m and n are the sizes of the pattern and text trees, respectively, and d is the maximum outdegree of the pattern tree. Here, we develop a new algorithm that improves the exponent 2d to d by considering a particular type of ancestor-descendant relationships and applying dynamic programming, thus reducing the time complexity to O^*(2^d). We then study restricted variants of the unordered tree inclusion problem where the number of occurrences of different node labels and/or the input trees\u27 heights are bounded. We show that although the problem remains NP-hard in many such cases, it can be solved in polynomial time for c = 2 and in O^*(1.8^d) time for c = 3 if the leaves of P are distinctly labeled and each label occurs at most c times in T. We also present a randomized O^*(1.883^d)-time algorithm for the case that the heights of P and T are one and two, respectively
A clique-based method for the edit distance between unordered trees and its application to analysis of glycan structures
[Background]Measuring similarities between tree structured data is important for analysis of RNA secondary structures, phylogenetic trees, glycan structures, and vascular trees. The edit distance is one of the most widely used measures for comparison of tree structured data. However, it is known that computation of the edit distance for rooted unordered trees is NP-hard. Furthermore, there is almost no available software tool that can compute the exact edit distance for unordered trees. [Results]In this paper, we present a practical method for computing the edit distance between rooted unordered trees. In this method, the edit distance problem for unordered trees is transformed into the maximum clique problem and then efficient solvers for the maximum clique problem are applied. We applied the proposed method to similar structure search for glycan structures. The result suggests that our proposed method can efficiently compute the edit distance for moderate size unordered trees. It also suggests that the proposed method has the accuracy comparative to those by the edit distance for ordered trees and by an existing method for glycan search. [Conclusions]The proposed method is simple but useful for computation of the edit distance between unordered trees. The object code is available upon request
Exact algorithms for computing the tree edit distance between unordered trees
This paper presents a fixed-parameter algorithm for the tree edit distance problem for unordered trees under the unit cost model that works in O(2.62^k⋅poly(n)) time and O(n^2) space, where the parameter k is the maximum bound of the edit distance and n is the maximum size of input trees. This paper also presents polynomial-time algorithms for the case where the maximum degree of the largest common subtree is bounded by a constan
A clique-based method using dynamic programming for computing edit distance between unordered trees.
Abstract Many kinds of tree-structured data, such as RNA secondary structures, have become available due to the progress of techniques in the field of molecular biology. To analyze the tree-structured data, various measures for computing the similarity between them have been developed and applied. Among them, tree edit distance is one of the most widely used measures. However, the tree edit distance problem for unordered trees is NP-hard. Therefore, it is required to develop efficient algorithms for the problem. Recently, a practical method called clique-based algorithm has been proposed, but it is not fast for large trees. This article presents an improved clique-based method for the tree edit distance problem for unordered trees. The improved method is obtained by introducing a dynamic programming scheme and heuristic techniques to the previous clique-based method. To evaluate the efficiency of the improved method, we applied the method to comparison of real tree structured data such as glycan structures. For large tree-structures, the improved method is much faster than the previous method. In particular, for hard instances, the improved method achieved more than 100 times speed-up
On the parameterized complexity of associative and commutative unification
This article studies the parameterized complexity of the unification problem with associative, commutative, or associative-commutative functions with respect to the parameter “number of variables”. It is shown that if every variable occurs only once then both of the associative and associative-commutative unification problems can be solved in polynomial time, but that in the general case, both problems are W[1]-hard even when one of the two input terms is variable-free. For commutative unification, an algorithm whose time complexity depends exponentially on the number of variables is presented; moreover, if a certain conjecture is true then the special case where one input term is variable-free belongs to FPT. Some related results are also derived for a natural generalization of the classic string and tree edit distance problems that allows variables
- …