657 research outputs found

    A constrained edit distance algorithm between semi-ordered trees

    Get PDF
    AbstractIn this paper, we propose a formal definition of a new class of trees called semi-ordered trees and a polynomial dynamic programming algorithm to compute a constrained edit distance between such trees. The core of the method relies on a similar approach to compare unordered [Kaizhong Zhang, A constrained edit distance between unordered labeled trees, Algorithmica 15 (1996) 205–222] and ordered trees [Kaizhong Zhang, Algorithms for the constrained editing distance between ordered labeled trees and related problems, Pattern Recognition 28 (3) (1995) 463–474]. The method is currently applied to evaluate the similarity between architectures of apple trees [Vincent Segura, Aida Ouangraoua, Pascal Ferraro, Evelyne Costes, Comparison of tree architecture using tree edit distances: Application to two-year-old apple tree, Euphytica 161 (2007) 155–164]

    Taming Horizontal Instability in Merge Trees: On the Computation of a Comprehensive Deformation-based Edit Distance

    Full text link
    Comparative analysis of scalar fields in scientific visualization often involves distance functions on topological abstractions. This paper focuses on the merge tree abstraction (representing the nesting of sub- or superlevel sets) and proposes the application of the unconstrained deformation-based edit distance. Previous approaches on merge trees often suffer from instability: small perturbations in the data can lead to large distances of the abstractions. While some existing methods can handle so-called vertical instability, the unconstrained deformation-based edit distance addresses both vertical and horizontal instabilities, also called saddle swaps. We establish the computational complexity as NP-complete, and provide an integer linear program formulation for computation. Experimental results on the TOSCA shape matching ensemble provide evidence for the stability of the proposed distance. We thereby showcase the potential of handling saddle swaps for comparison of scalar fields through merge trees

    New and Improved Algorithms for Unordered Tree Inclusion

    Get PDF
    The tree inclusion problem is, given two node-labeled trees P and T (the "pattern tree" and the "text tree"), to locate every minimal subtree in T (if any) that can be obtained by applying a sequence of node insertion operations to P. Although the ordered tree inclusion problem is solvable in polynomial time, the unordered tree inclusion problem is NP-hard. The currently fastest algorithm for the latter is from 1995 and runs in O(poly(m,n) * 2^{2d}) = O^*(2^{2d}) time, where m and n are the sizes of the pattern and text trees, respectively, and d is the maximum outdegree of the pattern tree. Here, we develop a new algorithm that improves the exponent 2d to d by considering a particular type of ancestor-descendant relationships and applying dynamic programming, thus reducing the time complexity to O^*(2^d). We then study restricted variants of the unordered tree inclusion problem where the number of occurrences of different node labels and/or the input trees\u27 heights are bounded. We show that although the problem remains NP-hard in many such cases, it can be solved in polynomial time for c = 2 and in O^*(1.8^d) time for c = 3 if the leaves of P are distinctly labeled and each label occurs at most c times in T. We also present a randomized O^*(1.883^d)-time algorithm for the case that the heights of P and T are one and two, respectively

    A clique-based method for the edit distance between unordered trees and its application to analysis of glycan structures

    Get PDF
    [Background]Measuring similarities between tree structured data is important for analysis of RNA secondary structures, phylogenetic trees, glycan structures, and vascular trees. The edit distance is one of the most widely used measures for comparison of tree structured data. However, it is known that computation of the edit distance for rooted unordered trees is NP-hard. Furthermore, there is almost no available software tool that can compute the exact edit distance for unordered trees. [Results]In this paper, we present a practical method for computing the edit distance between rooted unordered trees. In this method, the edit distance problem for unordered trees is transformed into the maximum clique problem and then efficient solvers for the maximum clique problem are applied. We applied the proposed method to similar structure search for glycan structures. The result suggests that our proposed method can efficiently compute the edit distance for moderate size unordered trees. It also suggests that the proposed method has the accuracy comparative to those by the edit distance for ordered trees and by an existing method for glycan search. [Conclusions]The proposed method is simple but useful for computation of the edit distance between unordered trees. The object code is available upon request

    A similarity measure on tree structured business data

    Get PDF
    In many business situations, products or user profile data are so complex that they need to be described by use of tree structures. Evaluating the similarity between tree-structured data is essential in many applications, such as recommender systems. To evaluate the similarity between two trees, concept corresponding nodes should be identified by constructing an edit distance mapping between them. Sometimes, the intension of one concept includes the intensions of several other concepts. In that situation, a one-to-many mapping should be constructed from the point of view of structures. This paper proposes a tree similarity measure model that can construct this kind of mapping. The similarity measure model takes into account all the information on nodes&rsquo; concepts, weights, and values. The conceptual similarity and the value similarity between two trees are evaluated based on the constructed mapping, and the final similarity measure is assessed as a weighted sum of their conceptual and value similarities. The effectiveness of the proposed similarity measure model is shown by an illustrative example and is also demonstrated by applying it into a recommender system.<br /

    Alignment Distance of Regular Tree Languages

    Get PDF

    The Weight Function in the Subtree Kernel is Decisive

    Get PDF
    Tree data are ubiquitous because they model a large variety of situations, e.g., the architecture of plants, the secondary structure of RNA, or the hierarchy of XML files. Nevertheless, the analysis of these non-Euclidean data is difficult per se. In this paper, we focus on the subtree kernel that is a convolution kernel for tree data introduced by Vishwanathan and Smola in the early 2000's. More precisely, we investigate the influence of the weight function from a theoretical perspective and in real data applications. We establish on a 2-classes stochastic model that the performance of the subtree kernel is improved when the weight of leaves vanishes, which motivates the definition of a new weight function, learned from the data and not fixed by the user as usually done. To this end, we define a unified framework for computing the subtree kernel from ordered or unordered trees, that is particularly suitable for tuning parameters. We show through eight real data classification problems the great efficiency of our approach, in particular for small datasets, which also states the high importance of the weight function. Finally, a visualization tool of the significant features is derived.Comment: 36 page

    A generalized Robinson-Foulds distance for labeled trees.

    Get PDF
    The Robinson-Foulds (RF) distance is a well-established measure between phylogenetic trees. Despite a lack of biological justification, it has the advantages of being a proper metric and being computable in linear time. For phylogenetic applications involving genes, however, a crucial aspect of the trees ignored by the RF metric is the type of the branching event (e.g. speciation, duplication, transfer, etc). We extend RF to trees with labeled internal nodes by including a node flip operation, alongside edge contractions and extensions. We explore properties of this extended RF distance in the case of a binary labeling. In particular, we show that contrary to the unlabeled case, an optimal edit path may require contracting "good" edges, i.e. edges shared between the two trees. We provide a 2-approximation algorithm which is shown to perform well empirically. Looking ahead, computing distances between labeled trees opens up a variety of new algorithmic directions.Implementation and simulations available at https://github.com/DessimozLab/pylabeledrf

    A generalized Robinson-Foulds distance for labeled trees

    Get PDF
    Background: The Robinson-Foulds (RF) distance is a well-established measure between phylogenetic trees. Despite a lack of biological justification, it has the advantages of being a proper metric and being computable in linear time. For phylogenetic applications involving genes, however, a crucial aspect of the trees ignored by the RF metric is the type of the branching event (e.g. speciation, duplication, transfer, etc). Results: We extend RF to trees with labeled internal nodes by including a node flip operation, alongside edge contractions and extensions. We explore properties of this extended RF distance in the case of a binary labeling. In particular, we show that contrary to the unlabeled case, an optimal edit path may require contracting “good” edges, i.e. edges shared between the two trees. Conclusions: We provide a 2-approximation algorithm which is shown to perform well empirically. Looking ahead, computing distances between labeled trees opens up a variety of new algorithmic directions. Implementation and simulations available at https://github.com/DessimozLab/pylabeledrf
    corecore