62 research outputs found

    Approximating the Graph Edit Distance with Compact Neighborhood Representations

    Full text link
    The graph edit distance is used for comparing graphs in various domains. Due to its high computational complexity it is primarily approximated. Widely-used heuristics search for an optimal assignment of vertices based on the distance between local substructures. While faster ones only consider vertices and their incident edges, leading to poor accuracy, other approaches require computationally intense exact distance computations between subgraphs. Our new method abstracts local substructures to neighborhood trees and compares them using efficient tree matching techniques. This results in a ground distance for mapping vertices that yields high quality approximations of the graph edit distance. By limiting the maximum tree height, our method supports steering between more accurate results and faster execution. We thoroughly analyze the running time of the tree matching method and propose several techniques to accelerate computation in practice. We use compressed tree representations, recognize redundancies by tree canonization and exploit them via caching. Experimentally we show that our method provides a significantly improved trade-off between running time and approximation quality compared to existing state-of-the-art approaches

    GEDLIB: Une bibliothèque C++ pour le calcul de la distance d'édition sur graphes

    Get PDF
    International audienceThe graph edit distance (GED) is a flexible graph dissimilarity measure widely used within the structural pattern recognition field. In this paper, we present GEDLIB, a C++ library for exactly or approximately computing GED. Many existing algorithms for GED are already implemented in GEDLIB. Moreover, GEDLIB is designed to be easily extensible: for implementing new edit cost functions and GED algorithms, it suffices to implement abstract classes contained in the library. For implementing these extensions, the user has access to a wide range of utilities, such as deep neural networks, support vector machines, mixed integer linear programming solvers, a blackbox optimizer, and solvers for the linear sum assignment problem with and without error-correction

    Un algorithme Hongrois pour l'appariement de graphes avec correction d'erreurs

    Get PDF
    International audienceBipartite graph matching algorithms become more and more popular to solve error-correcting graph matching problems and to approximate the graph edit distance of two graphs. However, the memory requirements and execution times of this method are respectively proportional to (n + m) 2 and (n + m) 3 where n and m are the order of the graphs. Subsequent developments reduced these complexities. However , these improvements are valid only under some constraints on the parameters of the graph edit distance. We propose in this paper a new formulation of the bipartite graph matching algorithm designed to solve efficiently the associated graph edit distance problem. The resulting algorithm requires O(nm) memory space and O(min(n, m) 2 max(n, m)) execution times.L'appariement de graphes biparti deviennent de plus en plus populaires pour résoudre des problèmes d'appariement de graphes avec correction d'erreurs et pour approximer la distance d'édition sur graphes. Cependant, les exigences en mémoire et temps de calcul de cette méthode sont respectivement proportionnels à (n + m)^2 et (n + m)^3 où n et m représentent la taille des deux graphes. Des développements ultérieurs ont réduit ces complexités. Cependant, ces améliorations ne sont valables que sous certaines contraintes sur les paramètres de la distance d'édition. Nous proposons dans cet article une nouvelle formulation de l'algorithme Hongrois conçu pour résoudre efficacement le problème de distance d'édition associé. L'algorithme résultat nécessite un espace mémoire O (nm) et des temps d'exécution O (min (n, m)^2 max (n, m))

    Upper Bounding the Graph Edit Distance Based on Rings and Machine Learning

    Full text link
    The graph edit distance (GED) is a flexible distance measure which is widely used for inexact graph matching. Since its exact computation is NP-hard, heuristics are used in practice. A popular approach is to obtain upper bounds for GED via transformations to the linear sum assignment problem with error-correction (LSAPE). Typically, local structures and distances between them are employed for carrying out this transformation, but recently also machine learning techniques have been used. In this paper, we formally define a unifying framework LSAPE-GED for transformations from GED to LSAPE. We also introduce rings, a new kind of local structures designed for graphs where most information resides in the topology rather than in the node labels. Furthermore, we propose two new ring based heuristics RING and RING-ML, which instantiate LSAPE-GED using the traditional and the machine learning based approach for transforming GED to LSAPE, respectively. Extensive experiments show that using rings for upper bounding GED significantly improves the state of the art on datasets where most information resides in the graphs' topologies. This closes the gap between fast but rather inaccurate LSAPE based heuristics and more accurate but significantly slower GED algorithms based on local search

    LIPIcs, Volume 274, ESA 2023, Complete Volume

    Get PDF
    LIPIcs, Volume 274, ESA 2023, Complete Volum

    LIPIcs, Volume 261, ICALP 2023, Complete Volume

    Get PDF
    LIPIcs, Volume 261, ICALP 2023, Complete Volum

    Metric Selection and Metric Learning for Matching Tasks

    Get PDF
    A quarter of a century after the world-wide web was born, we have grown accustomed to having easy access to a wealth of data sets and open-source software. The value of these resources is restricted if they are not properly integrated and maintained. A lot of this work boils down to matching; finding existing records about entities and enriching them with information from a new data source. In the realm of code this means integrating new code snippets into a code base while avoiding duplication. In this thesis, we address two different such matching problems. First, we leverage the diverse and mature set of string similarity measures in an iterative semisupervised learning approach to string matching. It is designed to query a user to make a sequence of decisions on specific cases of string matching. We show that we can find almost optimal solutions after only a small amount of such input. The low labelling complexity of our algorithm is due to addressing the cold start problem that is inherent to Active Learning; by ranking queries by variance before the arrival of enough supervision information, and by a self-regulating mechanism that counteracts initial biases. Second, we address the matching of code fragments for deduplication. Programming code is not only a tool, but also a resource that itself demands maintenance. Code duplication is a frequent problem arising especially from modern development practice. There are many reasons to detect and address code duplicates, for example to keep a clean and maintainable codebase. In such more complex data structures, string similarity measures are inadequate. In their stead, we study a modern supervised Metric Learning approach to model code similarity with Neural Networks. We find that in such a model representing the elementary tokens with a pretrained word embedding is the most important ingredient. Our results show both qualitatively (by visualization) that relatedness is modelled well by the embeddings and quantitatively (by ablation) that the encoded information is useful for the downstream matching task. As a non-technical contribution, we unify the common challenges arising in supervised learning approaches to Record Matching, Code Clone Detection and generic Metric Learning tasks. We give a novel account to string similarity measures from a psychological standpoint and point out and document one longstanding naming conflict in string similarity measures. Finally, we point out the overlap of latest research in Code Clone Detection with the field of Natural Language Processing

    Proceedings of the 26th International Symposium on Theoretical Aspects of Computer Science (STACS'09)

    Get PDF
    The Symposium on Theoretical Aspects of Computer Science (STACS) is held alternately in France and in Germany. The conference of February 26-28, 2009, held in Freiburg, is the 26th in this series. Previous meetings took place in Paris (1984), Saarbr¨ucken (1985), Orsay (1986), Passau (1987), Bordeaux (1988), Paderborn (1989), Rouen (1990), Hamburg (1991), Cachan (1992), W¨urzburg (1993), Caen (1994), M¨unchen (1995), Grenoble (1996), L¨ubeck (1997), Paris (1998), Trier (1999), Lille (2000), Dresden (2001), Antibes (2002), Berlin (2003), Montpellier (2004), Stuttgart (2005), Marseille (2006), Aachen (2007), and Bordeaux (2008). ..

    27th Annual European Symposium on Algorithms: ESA 2019, September 9-11, 2019, Munich/Garching, Germany

    Get PDF

    On the power of message passing for learning on graph-structured data

    Get PDF
    This thesis proposes novel approaches for machine learning on irregularly structured input data such as graphs, point clouds and manifolds. Specifically, we are breaking up with the regularity restriction of conventional deep learning techniques, and propose solutions in designing, implementing and scaling up deep end-to-end representation learning on graph-structured data, known as Graph Neural Networks (GNNs). GNNs capture local graph structure and feature information by following a neural message passing scheme, in which node representations are recursively updated in a trainable and purely local fashion. In this thesis, we demonstrate the generality of message passing through a unified framework suitable for a wide range of operators and learning tasks. Specifically, we analyze the limitations and inherent weaknesses of GNNs and propose efficient solutions to overcome them, both theoretically and in practice, e.g., by conditioning messages via continuous B-spline kernels, by utilizing hierarchical message passing, or by leveraging positional encodings. In addition, we ensure that our proposed methods scale naturally to large input domains. In particular, we propose novel methods to fully eliminate the exponentially increasing dependency of nodes over layers inherent to message passing GNNs. Lastly, we introduce PyTorch Geometric, a deep learning library for implementing and working with graph-based neural network building blocks, built upon PyTorch
    corecore