1,171 research outputs found
Revisiting the tree edit distance and its backtracing: A tutorial
Almost 30 years ago, Zhang and Shasha (1989) published a seminal paper
describing an efficient dynamic programming algorithm computing the tree edit
distance, that is, the minimum number of node deletions, insertions, and
replacements that are necessary to transform one tree into another. Since then,
the tree edit distance has been widely applied, for example in biology and
intelligent tutoring systems. However, the original paper of Zhang and Shasha
can be challenging to read for newcomers and it does not describe how to
efficiently infer the optimal edit script. In this contribution, we provide a
comprehensive tutorial to the tree edit distance algorithm of Zhang and Shasha.
We further prove metric properties of the tree edit distance, and describe
efficient algorithms to infer the cheapest edit script, as well as a summary of
all cheapest edit scripts between two trees.Comment: Supplementary material for the ICML 2018 paper: Tree Edit Distance
Learning via Adaptive Symbol Embedding
Automatic Wrapper Adaptation by Tree Edit Distance Matching
Information distributed through the Web keeps growing faster day by day,\ud
and for this reason, several techniques for extracting Web data have been suggested\ud
during last years. Often, extraction tasks are performed through so called wrappers,\ud
procedures extracting information from Web pages, e.g. implementing logic-based\ud
techniques. Many fields of application today require a strong degree of robustness\ud
of wrappers, in order not to compromise assets of information or reliability of data\ud
extracted.\ud
Unfortunately, wrappers may fail in the task of extracting data from a Web page, if\ud
its structure changes, sometimes even slightly, thus requiring the exploiting of new\ud
techniques to be automatically held so as to adapt the wrapper to the new structure\ud
of the page, in case of failure. In this work we present a novel approach of automatic wrapper adaptation based on the measurement of similarity of trees through\ud
improved tree edit distance matching techniques
Tree Edit Distance Learning via Adaptive Symbol Embeddings
Metric learning has the aim to improve classification accuracy by learning a
distance measure which brings data points from the same class closer together
and pushes data points from different classes further apart. Recent research
has demonstrated that metric learning approaches can also be applied to trees,
such as molecular structures, abstract syntax trees of computer programs, or
syntax trees of natural language, by learning the cost function of an edit
distance, i.e. the costs of replacing, deleting, or inserting nodes in a tree.
However, learning such costs directly may yield an edit distance which violates
metric axioms, is challenging to interpret, and may not generalize well. In
this contribution, we propose a novel metric learning approach for trees which
we call embedding edit distance learning (BEDL) and which learns an edit
distance indirectly by embedding the tree nodes as vectors, such that the
Euclidean distance between those vectors supports class discrimination. We
learn such embeddings by reducing the distance to prototypical trees from the
same class and increasing the distance to prototypical trees from different
classes. In our experiments, we show that BEDL improves upon the
state-of-the-art in metric learning for trees on six benchmark data sets,
ranging from computer science over biomedical data to a natural-language
processing data set containing over 300,000 nodes.Comment: Paper at the International Conference of Machine Learning (2018),
2018-07-10 to 2018-07-15 in Stockholm, Swede
Learning Stochastic Tree Edit Distance
pages 42-53International audienceTrees provide a suited structural representation to deal with complex tasks such as web information extraction, RNA secondary structure prediction, or conversion of tree structured documents. In this context, many applications require the calculation of similarities between tree pairs. The most studied distance is likely the tree edit distance for which improvements in terms of complexity have been achieved during the last decade. However, this classic edit distance usually uses a priori fixed edit costs which are often difficult to tune, that leaves little room for tackling complex problems. In this paper, we focus on the learning of a stochastic tree edit distance. We use an adaptation of the expectation-maximization algorithm for learning the primitive edit costs. We carried out several series of experiments that confirm the interest to learn a tree edit distance rather than a priori imposing edit costs
Tree edit distance as a baseline approach for paraphrase representation
Finding an adequate paraphrase representation formalism is a challenging issue in Natural Language Processing. In this paper, we analyse the performance of Tree Edit Distance as a paraphrase representation baseline. Our experiments using Edit Distance Textual Entailment Suite show that, as Tree Edit Distance consists of a purely syntactic approach, paraphrase alternations not based on structural reorganizations do not find an adequate representation. They also show that there is much scope for better modelling of the way trees are aligned
An O(n^3)-Time Algorithm for Tree Edit Distance
The {\em edit distance} between two ordered trees with vertex labels is the
minimum cost of transforming one tree into the other by a sequence of
elementary operations consisting of deleting and relabeling existing nodes, as
well as inserting new nodes. In this paper, we present a worst-case
-time algorithm for this problem, improving the previous best
-time algorithm~\cite{Klein}. Our result requires a novel
adaptive strategy for deciding how a dynamic program divides into subproblems
(which is interesting in its own right), together with a deeper understanding
of the previous algorithms for the problem. We also prove the optimality of our
algorithm among the family of \emph{decomposition strategy} algorithms--which
also includes the previous fastest algorithms--by tightening the known lower
bound of ~\cite{Touzet} to , matching our
algorithm's running time. Furthermore, we obtain matching upper and lower
bounds of when the two trees have
different sizes and~, where .Comment: 10 pages, 5 figures, 5 .tex files where TED.tex is the main on
Subcubic algorithm for (Unweighted) Unrooted Tree Edit Distance
The tree edit distance problem is a natural generalization of the classic
string edit distance problem. Given two ordered, edge-labeled trees and
, the edit distance between and is defined as the minimum
total cost of operations that transform into . In one operation, we
can contract an edge, split a vertex into two or change the label of an edge.
For the weighted version of the problem, where the cost of each operation
depends on the type of the operation and the label on the edge involved,
time algorithms are known for both rooted and unrooted
trees. The existence of a truly subcubic time
algorithm is unlikely, as it would imply a truly subcubic algorithm for the
APSP problem. However, recently Mao (FOCS'21) showed that if we assume that
each operation has a unit cost, then the tree edit distance between two rooted
trees can be computed in truly subcubic time. In this paper, we show how to
adapt Mao's algorithm to make it work for unrooted trees and we show an
time algorithm for the unweighted tree edit distance
between two unrooted trees, where is the matrix
multiplication exponent. It is the first known subcubic algorithm for unrooted
trees. The main idea behind our algorithm is the fact that to compute the tree
edit distance between two unrooted trees, it is enough to compute the tree edit
distance between an arbitrary rooting of the first tree and every rooting of
the second tree.Comment: 20 page
Comparing similar ordered trees in linear-time
AbstractWe describe a linear-time algorithm for comparing two similar ordered rooted trees with node labels. The method for comparing trees is the usual tree edit distance. We show that an optimal mapping that uses at most k insertions or deletions can then be constructed in O(nk3) where n is the size of the trees. The approach is inspired by the Zhang–Shasha algorithm for tree edit distance in combination with an adequate pruning of the search space based on the tree edit graph
- …