62 research outputs found
Approximating the Graph Edit Distance with Compact Neighborhood Representations
The graph edit distance is used for comparing graphs in various domains. Due
to its high computational complexity it is primarily approximated. Widely-used
heuristics search for an optimal assignment of vertices based on the distance
between local substructures. While faster ones only consider vertices and their
incident edges, leading to poor accuracy, other approaches require
computationally intense exact distance computations between subgraphs. Our new
method abstracts local substructures to neighborhood trees and compares them
using efficient tree matching techniques. This results in a ground distance for
mapping vertices that yields high quality approximations of the graph edit
distance. By limiting the maximum tree height, our method supports steering
between more accurate results and faster execution. We thoroughly analyze the
running time of the tree matching method and propose several techniques to
accelerate computation in practice. We use compressed tree representations,
recognize redundancies by tree canonization and exploit them via caching.
Experimentally we show that our method provides a significantly improved
trade-off between running time and approximation quality compared to existing
state-of-the-art approaches
GEDLIB: Une bibliothèque C++ pour le calcul de la distance d'édition sur graphes
International audienceThe graph edit distance (GED) is a flexible graph dissimilarity measure widely used within the structural pattern recognition field. In this paper, we present GEDLIB, a C++ library for exactly or approximately computing GED. Many existing algorithms for GED are already implemented in GEDLIB. Moreover, GEDLIB is designed to be easily extensible: for implementing new edit cost functions and GED algorithms, it suffices to implement abstract classes contained in the library. For implementing these extensions, the user has access to a wide range of utilities, such as deep neural networks, support vector machines, mixed integer linear programming solvers, a blackbox optimizer, and solvers for the linear sum assignment problem with and without error-correction
Un algorithme Hongrois pour l'appariement de graphes avec correction d'erreurs
International audienceBipartite graph matching algorithms become more and more popular to solve error-correcting graph matching problems and to approximate the graph edit distance of two graphs. However, the memory requirements and execution times of this method are respectively proportional to (n + m) 2 and (n + m) 3 where n and m are the order of the graphs. Subsequent developments reduced these complexities. However , these improvements are valid only under some constraints on the parameters of the graph edit distance. We propose in this paper a new formulation of the bipartite graph matching algorithm designed to solve efficiently the associated graph edit distance problem. The resulting algorithm requires O(nm) memory space and O(min(n, m) 2 max(n, m)) execution times.L'appariement de graphes biparti deviennent de plus en plus populaires pour résoudre des problèmes d'appariement de graphes avec correction d'erreurs et pour approximer la distance d'édition sur graphes. Cependant, les exigences en mémoire et temps de calcul de cette méthode sont respectivement proportionnels à (n + m)^2 et (n + m)^3 où n et m représentent la taille des deux graphes. Des développements ultérieurs ont réduit ces complexités. Cependant, ces améliorations ne sont valables que sous certaines contraintes sur les paramètres de la distance d'édition. Nous proposons dans cet article une nouvelle formulation de l'algorithme Hongrois conçu pour résoudre efficacement le problème de distance d'édition associé. L'algorithme résultat nécessite un espace mémoire O (nm) et des temps d'exécution O (min (n, m)^2 max (n, m))
Upper Bounding the Graph Edit Distance Based on Rings and Machine Learning
The graph edit distance (GED) is a flexible distance measure which is widely
used for inexact graph matching. Since its exact computation is NP-hard,
heuristics are used in practice. A popular approach is to obtain upper bounds
for GED via transformations to the linear sum assignment problem with
error-correction (LSAPE). Typically, local structures and distances between
them are employed for carrying out this transformation, but recently also
machine learning techniques have been used. In this paper, we formally define a
unifying framework LSAPE-GED for transformations from GED to LSAPE. We also
introduce rings, a new kind of local structures designed for graphs where most
information resides in the topology rather than in the node labels.
Furthermore, we propose two new ring based heuristics RING and RING-ML, which
instantiate LSAPE-GED using the traditional and the machine learning based
approach for transforming GED to LSAPE, respectively. Extensive experiments
show that using rings for upper bounding GED significantly improves the state
of the art on datasets where most information resides in the graphs'
topologies. This closes the gap between fast but rather inaccurate LSAPE based
heuristics and more accurate but significantly slower GED algorithms based on
local search
LIPIcs, Volume 274, ESA 2023, Complete Volume
LIPIcs, Volume 274, ESA 2023, Complete Volum
LIPIcs, Volume 261, ICALP 2023, Complete Volume
LIPIcs, Volume 261, ICALP 2023, Complete Volum
Metric Selection and Metric Learning for Matching Tasks
A quarter of a century after the world-wide web was born, we have grown accustomed to having easy access to a wealth of data sets and open-source software. The value of these resources is restricted if they are not properly integrated and maintained. A lot of this work boils down to matching; finding existing records about entities and enriching them with information from a new data source. In the realm of code this means integrating new code snippets into a code base while avoiding duplication.
In this thesis, we address two different such matching problems. First, we leverage the diverse and mature set of string similarity measures in an iterative semisupervised learning approach to string matching. It is designed to query a user to make a sequence of decisions on specific cases of string matching. We show that we can find almost optimal solutions after only a small amount of such input. The low labelling complexity of our algorithm is due to addressing the cold start problem that is inherent to Active Learning; by ranking queries by variance before the arrival of enough supervision information, and by a self-regulating mechanism that counteracts initial biases.
Second, we address the matching of code fragments for deduplication. Programming code is not only a tool, but also a resource that itself demands maintenance. Code duplication is a frequent problem arising especially from modern development practice. There are many reasons to detect and address code duplicates, for example to keep a clean and maintainable codebase. In such more complex data structures, string similarity measures are inadequate. In their stead, we study a modern supervised Metric Learning approach to model code similarity with Neural Networks. We find that in such a model representing the elementary tokens with a pretrained word embedding is the most important ingredient. Our results show both qualitatively (by visualization) that relatedness is modelled well by the embeddings and quantitatively (by ablation) that the encoded information is useful for the downstream matching task.
As a non-technical contribution, we unify the common challenges arising in supervised learning approaches to Record Matching, Code Clone Detection and generic Metric Learning tasks. We give a novel account to string similarity measures from a psychological standpoint and point out and document one longstanding naming conflict in string similarity measures. Finally, we point out the overlap of latest research in Code Clone Detection with the field of Natural Language Processing
Proceedings of the 26th International Symposium on Theoretical Aspects of Computer Science (STACS'09)
The Symposium on Theoretical Aspects of Computer Science (STACS) is held alternately in France and in Germany. The conference of February 26-28, 2009, held in Freiburg, is the 26th in this series. Previous meetings took place in Paris (1984), Saarbr¨ucken (1985), Orsay (1986), Passau (1987), Bordeaux (1988), Paderborn (1989), Rouen (1990), Hamburg (1991), Cachan (1992), W¨urzburg (1993), Caen (1994), M¨unchen (1995), Grenoble (1996), L¨ubeck (1997), Paris (1998), Trier (1999), Lille (2000), Dresden (2001), Antibes (2002), Berlin (2003), Montpellier (2004), Stuttgart (2005), Marseille (2006), Aachen (2007), and Bordeaux (2008). ..
On the power of message passing for learning on graph-structured data
This thesis proposes novel approaches for machine learning on irregularly structured input data such as graphs, point clouds and manifolds. Specifically, we are breaking up with the regularity restriction of conventional deep learning techniques, and propose solutions in designing, implementing and scaling up deep end-to-end representation learning on graph-structured data, known as Graph Neural Networks (GNNs).
GNNs capture local graph structure and feature information by following a neural message passing scheme, in which node representations are recursively updated in a trainable and purely local fashion. In this thesis, we demonstrate the generality of message passing through a unified framework suitable for a wide range of operators and learning tasks. Specifically, we analyze the limitations and inherent weaknesses of GNNs and propose efficient solutions to overcome them, both theoretically and in practice, e.g., by conditioning messages via continuous B-spline kernels, by utilizing hierarchical message passing, or by leveraging positional encodings. In addition, we ensure that our proposed methods scale naturally to large input domains. In particular, we propose novel methods to fully eliminate the exponentially increasing dependency of nodes over layers inherent to message passing GNNs. Lastly, we introduce PyTorch Geometric, a deep learning library for implementing and working with graph-based neural network building blocks, built upon PyTorch
- …