20 research outputs found

    A distance for partially labeled trees

    Get PDF
    In a number of practical situations, data have structure and the relations among its component parts need to be coded with suitable data models. Trees are usually utilized for representing data for which hierarchical relations can be defined. This is the case in a number of fields like image analysis, natural language processing, protein structure, or music retrieval, to name a few. In those cases, procedures for comparing trees are very relevant. An approximate tree edit distance algorithm has been introduced for working with trees labeled only at the leaves. In this paper, it has been applied to handwritten character recognition, providing accuracies comparable to those by the most comprehensive search method, being as efficient as the fastest.This work is supported by the Spanish Ministry projects DRIMS (TIN2009-14247-C02), and Consolider Ingenio 2010 (MIPRCV, CSD2007-00018), partially supported by EU ERDF and the Pascal Network of Excellence

    Two Metrics on Rooted Unordered Trees with Labels

    Full text link
    The early development of a zygote can be mathematically described by a developmental tree. To compare developmental trees of different species, we need to define distances on trees. If children cells after a division are not distinguishable, developmental trees are represented by the space of rooted trees with possibly repeated labels, where all vertices are unordered. On this space, we define two metrics: the best-match metric and the left-regular metric, which show some advantages over existing methods. If children cells after a division are partially distinguishable, developmental trees are represented by the space of rooted trees with possibly repeated labels, where vertices can be ordered or unordered. This space cannot have a metric. Instead, we define a semimetric, which is a variant of the best-match metric. To compute the best-match distance between two trees, the expected time complexity and worst-case time complexity are both O(n2)\mathcal{O}(n^2), where nn is the tree size. To compute the left-regular distance between two trees, the expected time complexity is O(n)\mathcal{O}(n), and the worst-case time complexity is O(nlog⁥n)\mathcal{O}(n\log n)

    Online Diversity Control in Symbolic Regression via a Fast Hash-based Tree Similarity Measure

    Full text link
    Diversity represents an important aspect of genetic programming, being directly correlated with search performance. When considered at the genotype level, diversity often requires expensive tree distance measures which have a negative impact on the algorithm's runtime performance. In this work we introduce a fast, hash-based tree distance measure to massively speed-up the calculation of population diversity during the algorithmic run. We combine this measure with the standard GA and the NSGA-II genetic algorithms to steer the search towards higher diversity. We validate the approach on a collection of benchmark problems for symbolic regression where our method consistently outperforms the standard GA as well as NSGA-II configurations with different secondary objectives.Comment: 8 pages, conference, submitted to congress on evolutionary computatio

    Homeomorphic Alignment of Weighted Trees

    Get PDF
    International audienceMotion capture, a currently active research area, needs estimation of the pose of the subject. For this purpose, we match the tree representation of the skeleton of the 3D shape to a pre-specified tree model. Unfortunately, the tree representation can contain vertices that split limbs in multiple parts, which do not allow a good match by usual methods. To solve this problem, we propose a new alignment, taking into account the homeomorphism between trees, rather than the isomorphism, as in prior works. Then, we develop several computationally efficient algorithms for reaching real-time motion capture

    Quantifying the degree of self-nestedness of trees. Application to the structural analysis of plants

    Get PDF
    17 pagesInternational audienceIn this paper we are interested in the problem of approximating trees by trees with a particular self-nested structure. Self-nested trees are such that all their subtrees of a given height are isomorphic. We show that these trees present remarkable compression properties, with high compression rates. In order to measure how far a tree is from being a self-nested tree, we then study how to quantify the degree of self-nestedness of any tree. For this, we deïŹne a measure of the self-nestedness of a tree by constructing a self-nested tree that minimizes the distance of the original tree to the set of self-nested trees that embed the initial tree. We show that this measure can be computed in polynomial time and depict the corresponding algorithm. The distance to this nearest embedding self-nested tree (NEST) is then used to deïŹne compression coefïŹcients that reïŹ‚ect the compressibility of a tree. To illustrate this approach, we then apply these notions to the analysis of plant branching structures. Based on a database of simulated theoretical plants in which different levels of noise have been introduced, we evaluate the method and show that the NESTs of such branching structures restore partly or completely the original, noiseless, branching structures. The whole approach is then applied to the analysis of a real plant (a rice panicle) whose topological structure was completely measured. We show that the NEST of this plant may be interpreted in biological terms and may be used to reveal important aspects of the plant growth

    Detection of Common Subtrees with Identical Label Distribution

    Full text link
    Frequent pattern mining is a relevant method to analyse structured data, like sequences, trees or graphs. It consists in identifying characteristic substructures of a dataset. This paper deals with a new type of patterns for tree data: common subtrees with identical label distribution. Their detection is far from obvious since the underlying isomorphism problem is graph isomorphism complete. An elaborated search algorithm is developed and analysed from both theoretical and numerical perspectives. Based on this, the enumeration of patterns is performed through a new lossless compression scheme for trees, called DAG-RW, whose complexity is investigated as well. The method shows very good properties, both in terms of computation times and analysis of real datasets from the literature. Compared to other substructures like topological subtrees and labelled subtrees for which the isomorphism problem is linear, the patterns found provide a more parsimonious representation of the data.Comment: 40 page

    Polynomial-time metrics for attributed trees

    Get PDF
    We address the problem of comparing attributed trees and propose four novel distance measures centered around the notion of a maximal similarity common subtree. The proposed measures are general and defined on trees endowed with either symbolic or continuous-valued attributes and can be applied to rooted as well as unrooted trees. We prove that our measures satisfythe metric constraints and provide a polynomial-time algorithm to compute them. This is a remarkable and attractive property, since the computation of traditional edit-distance-based metrics is, in general, NP-complete, at least in the unordered case. We experimentally validate the usefulness of our metrics on shape matching tasks and compare them with (an approximation of) edit-distance
    corecore