20 research outputs found
A distance for partially labeled trees
In a number of practical situations, data have structure and the relations among its component parts need to be coded with suitable data models. Trees are usually utilized for representing data for which hierarchical relations can be defined. This is the case in a number of fields like image analysis, natural language processing, protein structure, or music retrieval, to name a few. In those cases, procedures for comparing trees are very relevant. An approximate tree edit distance algorithm has been introduced for working with trees labeled only at the leaves. In this paper, it has been applied to handwritten character recognition, providing accuracies comparable to those by the most comprehensive search method, being as efficient as the fastest.This work is supported by the Spanish Ministry projects DRIMS (TIN2009-14247-C02), and Consolider Ingenio 2010 (MIPRCV, CSD2007-00018), partially supported by EU ERDF and the Pascal Network of Excellence
Two Metrics on Rooted Unordered Trees with Labels
The early development of a zygote can be mathematically described by a
developmental tree. To compare developmental trees of different species, we
need to define distances on trees. If children cells after a division are not
distinguishable, developmental trees are represented by the space of rooted
trees with possibly repeated labels, where all vertices are unordered. On this
space, we define two metrics: the best-match metric and the left-regular
metric, which show some advantages over existing methods. If children cells
after a division are partially distinguishable, developmental trees are
represented by the space of rooted trees with possibly repeated labels, where
vertices can be ordered or unordered. This space cannot have a metric. Instead,
we define a semimetric, which is a variant of the best-match metric. To compute
the best-match distance between two trees, the expected time complexity and
worst-case time complexity are both , where is the tree
size. To compute the left-regular distance between two trees, the expected time
complexity is , and the worst-case time complexity is
Online Diversity Control in Symbolic Regression via a Fast Hash-based Tree Similarity Measure
Diversity represents an important aspect of genetic programming, being
directly correlated with search performance. When considered at the genotype
level, diversity often requires expensive tree distance measures which have a
negative impact on the algorithm's runtime performance. In this work we
introduce a fast, hash-based tree distance measure to massively speed-up the
calculation of population diversity during the algorithmic run. We combine this
measure with the standard GA and the NSGA-II genetic algorithms to steer the
search towards higher diversity. We validate the approach on a collection of
benchmark problems for symbolic regression where our method consistently
outperforms the standard GA as well as NSGA-II configurations with different
secondary objectives.Comment: 8 pages, conference, submitted to congress on evolutionary
computatio
Homeomorphic Alignment of Weighted Trees
International audienceMotion capture, a currently active research area, needs estimation of the pose of the subject. For this purpose, we match the tree representation of the skeleton of the 3D shape to a pre-specified tree model. Unfortunately, the tree representation can contain vertices that split limbs in multiple parts, which do not allow a good match by usual methods. To solve this problem, we propose a new alignment, taking into account the homeomorphism between trees, rather than the isomorphism, as in prior works. Then, we develop several computationally efficient algorithms for reaching real-time motion capture
Quantifying the degree of self-nestedness of trees. Application to the structural analysis of plants
17 pagesInternational audienceIn this paper we are interested in the problem of approximating trees by trees with a particular self-nested structure. Self-nested trees are such that all their subtrees of a given height are isomorphic. We show that these trees present remarkable compression properties, with high compression rates. In order to measure how far a tree is from being a self-nested tree, we then study how to quantify the degree of self-nestedness of any tree. For this, we deïŹne a measure of the self-nestedness of a tree by constructing a self-nested tree that minimizes the distance of the original tree to the set of self-nested trees that embed the initial tree. We show that this measure can be computed in polynomial time and depict the corresponding algorithm. The distance to this nearest embedding self-nested tree (NEST) is then used to deïŹne compression coefïŹcients that reïŹect the compressibility of a tree. To illustrate this approach, we then apply these notions to the analysis of plant branching structures. Based on a database of simulated theoretical plants in which different levels of noise have been introduced, we evaluate the method and show that the NESTs of such branching structures restore partly or completely the original, noiseless, branching structures. The whole approach is then applied to the analysis of a real plant (a rice panicle) whose topological structure was completely measured. We show that the NEST of this plant may be interpreted in biological terms and may be used to reveal important aspects of the plant growth
Detection of Common Subtrees with Identical Label Distribution
Frequent pattern mining is a relevant method to analyse structured data, like
sequences, trees or graphs. It consists in identifying characteristic
substructures of a dataset. This paper deals with a new type of patterns for
tree data: common subtrees with identical label distribution. Their detection
is far from obvious since the underlying isomorphism problem is graph
isomorphism complete. An elaborated search algorithm is developed and analysed
from both theoretical and numerical perspectives. Based on this, the
enumeration of patterns is performed through a new lossless compression scheme
for trees, called DAG-RW, whose complexity is investigated as well. The method
shows very good properties, both in terms of computation times and analysis of
real datasets from the literature. Compared to other substructures like
topological subtrees and labelled subtrees for which the isomorphism problem is
linear, the patterns found provide a more parsimonious representation of the
data.Comment: 40 page
Polynomial-time metrics for attributed trees
We address the problem of comparing attributed trees and propose four novel distance measures centered around the
notion of a maximal similarity common subtree. The proposed measures are general and defined on trees endowed with either
symbolic or continuous-valued attributes and can be applied to rooted as well as unrooted trees. We prove that our measures satisfythe metric constraints and provide a polynomial-time algorithm to compute them. This is a remarkable and attractive property, since the computation of traditional edit-distance-based metrics is, in general, NP-complete, at least in the unordered case. We experimentally validate the usefulness of our metrics on shape matching tasks and compare them with (an approximation of) edit-distance