36,729 research outputs found

    Graph edit distance or graph edit pseudo-distance?

    Get PDF
    Graph Edit Distance has been intensively used since its appearance in 1983. This distance is very appropriate if we want to compare a pair of attributed graphs from any domain and obtain not only a distance, but also the best correspondence between nodes of the involved graphs. In this paper, we want to analyse if the Graph Edit Distance can be really considered a distance or a pseudo-distance, since some restrictions of the distance function are not fulfilled. Distinguishing between both cases is important because the use of a distance is a restriction in some methods to return exact instead of approximate results. This occurs, for instance, in some graph retrieval techniques. Experimental validation shows that in most of the cases, it is not appropriate to denominate the Graph Edit Distance as a distance, but a pseudo-distance instead, since the triangle inequality is not fulfilled. Therefore, in these cases, the graph retrieval techniques not always return the optimal graph

    A Feature-Based Call Graph Distance Measure for Program Similarity Analysis

    Get PDF
    A measurement for how similar (or distant) two computer programs are has a wide range of possible applications. For example, they can be applied to malware analysis or analysis of university students' programming exercises. However, as programs may be arbitrarily structured, capturing the similarity of two non-trivial programs is a complex task. By extracting call graphs (graphs of caller-callee relationships of the program's functions, where nodes denote functions and directed edges denote function calls) from the programs, the similarity measurement can be changed into a graph problem. Previously, static call graph distance measures have been largely based on graph matching techniques, e.g. graph edit distance or maximum common subgraph, which are known to be costly. We propose a call graph distance measure based on features that preserve some structural information from the call graph without explicitly matching user defined functions together. We define basic properties of the features, several ways to compute the feature values, and give a basic algorithm for generating the features. We evaluate our features using two small datasets: a dataset of malware variants, and a dataset of university students' programming exercises, focusing especially on the former. For our evaluation we use experiments in information retrieval and clustering. We compare our results for both datasets to a baseline, and additionally for the malware dataset to the results obtained with a graph edit distance approximation. In our preliminary results we show that even though the feature generation approach is simpler than the graph edit distance approximation, the generated features can perform on a similar level as the graph edit distance approximation. However, experiments on larger datasets are still required to verify the results

    Convex Graph Invariant Relaxations For Graph Edit Distance

    Get PDF
    The edit distance between two graphs is a widely used measure of similarity that evaluates the smallest number of vertex and edge deletions/insertions required to transform one graph to another. It is NP-hard to compute in general, and a large number of heuristics have been proposed for approximating this quantity. With few exceptions, these methods generally provide upper bounds on the edit distance between two graphs. In this paper, we propose a new family of computationally tractable convex relaxations for obtaining lower bounds on graph edit distance. These relaxations can be tailored to the structural properties of the particular graphs via convex graph invariants. Specific examples that we highlight in this paper include constraints on the graph spectrum as well as (tractable approximations of) the stability number and the maximum-cut values of graphs. We prove under suitable conditions that our relaxations are tight (i.e., exactly compute the graph edit distance) when one of the graphs consists of few eigenvalues. We also validate the utility of our framework on synthetic problems as well as real applications involving molecular structure comparison problems in chemistry.Comment: 27 pages, 7 figure
    • …
    corecore