21,591 research outputs found
Approximating the Graph Edit Distance with Compact Neighborhood Representations
The graph edit distance is used for comparing graphs in various domains. Due
to its high computational complexity it is primarily approximated. Widely-used
heuristics search for an optimal assignment of vertices based on the distance
between local substructures. While faster ones only consider vertices and their
incident edges, leading to poor accuracy, other approaches require
computationally intense exact distance computations between subgraphs. Our new
method abstracts local substructures to neighborhood trees and compares them
using efficient tree matching techniques. This results in a ground distance for
mapping vertices that yields high quality approximations of the graph edit
distance. By limiting the maximum tree height, our method supports steering
between more accurate results and faster execution. We thoroughly analyze the
running time of the tree matching method and propose several techniques to
accelerate computation in practice. We use compressed tree representations,
recognize redundancies by tree canonization and exploit them via caching.
Experimentally we show that our method provides a significantly improved
trade-off between running time and approximation quality compared to existing
state-of-the-art approaches
Upper Bounding the Graph Edit Distance Based on Rings and Machine Learning
The graph edit distance (GED) is a flexible distance measure which is widely
used for inexact graph matching. Since its exact computation is NP-hard,
heuristics are used in practice. A popular approach is to obtain upper bounds
for GED via transformations to the linear sum assignment problem with
error-correction (LSAPE). Typically, local structures and distances between
them are employed for carrying out this transformation, but recently also
machine learning techniques have been used. In this paper, we formally define a
unifying framework LSAPE-GED for transformations from GED to LSAPE. We also
introduce rings, a new kind of local structures designed for graphs where most
information resides in the topology rather than in the node labels.
Furthermore, we propose two new ring based heuristics RING and RING-ML, which
instantiate LSAPE-GED using the traditional and the machine learning based
approach for transforming GED to LSAPE, respectively. Extensive experiments
show that using rings for upper bounding GED significantly improves the state
of the art on datasets where most information resides in the graphs'
topologies. This closes the gap between fast but rather inaccurate LSAPE based
heuristics and more accurate but significantly slower GED algorithms based on
local search
Error-tolerant Finite State Recognition with Applications to Morphological Analysis and Spelling Correction
Error-tolerant recognition enables the recognition of strings that deviate
mildly from any string in the regular set recognized by the underlying finite
state recognizer. Such recognition has applications in error-tolerant
morphological processing, spelling correction, and approximate string matching
in information retrieval. After a description of the concepts and algorithms
involved, we give examples from two applications: In the context of
morphological analysis, error-tolerant recognition allows misspelled input word
forms to be corrected, and morphologically analyzed concurrently. We present an
application of this to error-tolerant analysis of agglutinative morphology of
Turkish words. The algorithm can be applied to morphological analysis of any
language whose morphology is fully captured by a single (and possibly very
large) finite state transducer, regardless of the word formation processes and
morphographemic phenomena involved. In the context of spelling correction,
error-tolerant recognition can be used to enumerate correct candidate forms
from a given misspelled string within a certain edit distance. Again, it can be
applied to any language with a word list comprising all inflected forms, or
whose morphology is fully described by a finite state transducer. We present
experimental results for spelling correction for a number of languages. These
results indicate that such recognition works very efficiently for candidate
generation in spelling correction for many European languages such as English,
Dutch, French, German, Italian (and others) with very large word lists of root
and inflected forms (some containing well over 200,000 forms), generating all
candidate solutions within 10 to 45 milliseconds (with edit distance 1) on a
SparcStation 10/41. For spelling correction in Turkish, error-tolerantComment: Replaces 9504031. gzipped, uuencoded postscript file. To appear in
Computational Linguistics Volume 22 No:1, 1996, Also available as
ftp://ftp.cs.bilkent.edu.tr/pub/ko/clpaper9512.ps.
- …