26 research outputs found
Faster Algorithms for the Maximum Common Subtree Isomorphism Problem
The maximum common subtree isomorphism problem asks for the largest possible
isomorphism between subtrees of two given input trees. This problem is a
natural restriction of the maximum common subgraph problem, which is -hard in general graphs. Confining to trees renders polynomial time
algorithms possible and is of fundamental importance for approaches on more
general graph classes. Various variants of this problem in trees have been
intensively studied. We consider the general case, where trees are neither
rooted nor ordered and the isomorphism is maximum w.r.t. a weight function on
the mapped vertices and edges. For trees of order and maximum degree
our algorithm achieves a running time of by
exploiting the structure of the matching instances arising as subproblems. Thus
our algorithm outperforms the best previously known approaches. No faster
algorithm is possible for trees of bounded degree and for trees of unbounded
degree we show that a further reduction of the running time would directly
improve the best known approach to the assignment problem. Combining a
polynomial-delay algorithm for the enumeration of all maximum common subtree
isomorphisms with central ideas of our new algorithm leads to an improvement of
its running time from to ,
where is the order of the larger tree, is the number of different
solutions, and is the minimum of the maximum degrees of the input
trees. Our theoretical results are supplemented by an experimental evaluation
on synthetic and real-world instances
A Survey on Graph Kernels
Graph kernels have become an established and widely-used technique for
solving classification tasks on graphs. This survey gives a comprehensive
overview of techniques for kernel-based graph classification developed in the
past 15 years. We describe and categorize graph kernels based on properties
inherent to their design, such as the nature of their extracted graph features,
their method of computation and their applicability to problems in practice. In
an extensive experimental evaluation, we study the classification accuracy of a
large suite of graph kernels on established benchmarks as well as new datasets.
We compare the performance of popular kernels with several baseline methods and
study the effect of applying a Gaussian RBF kernel to the metric induced by a
graph kernel. In doing so, we find that simple baselines become competitive
after this transformation on some datasets. Moreover, we study the extent to
which existing graph kernels agree in their predictions (and prediction errors)
and obtain a data-driven categorization of kernels as result. Finally, based on
our experimental results, we derive a practitioner's guide to kernel-based
graph classification
Gradual Weisfeiler-Leman: Slow and Steady Wins the Race
The classical Weisfeiler-Leman algorithm aka color refinement is fundamental
for graph learning and central for successful graph kernels and graph neural
networks. Originally developed for graph isomorphism testing, the algorithm
iteratively refines vertex colors. On many datasets, the stable coloring is
reached after a few iterations and the optimal number of iterations for machine
learning tasks is typically even lower. This suggests that the colors diverge
too fast, defining a similarity that is too coarse. We generalize the concept
of color refinement and propose a framework for gradual neighborhood
refinement, which allows a slower convergence to the stable coloring and thus
provides a more fine-grained refinement hierarchy and vertex similarity. We
assign new colors by clustering vertex neighborhoods, replacing the original
injective color assignment function. Our approach is used to derive new
variants of existing graph kernels and to approximate the graph edit distance
via optimal assignments regarding vertex similarity. We show that in both
tasks, our method outperforms the original color refinement with only moderate
increase in running time advancing the state of the art
Largest Weight Common Subtree Embeddings with Distance Penalties
The largest common embeddable subtree problem asks for the largest possible tree embeddable into two input trees and generalizes the classical maximum common subtree problem. Several variants of the problem in labeled and unlabeled rooted trees have been studied, e.g., for the comparison of evolutionary trees. We consider a generalization, where the sought embedding is maximal with regard to a weight function on pairs of labels. We support rooted and unrooted trees with vertex and edge labels as well as distance penalties for skipping vertices. This variant is important for many applications such as the comparison of chemical structures and evolutionary trees. Our algorithm computes the solution from a series of bipartite matching instances, which are solved efficiently by exploiting their structural relation and imbalance. Our analysis shows that our approach improves or matches the running time of the formally best algorithms for several problem variants. Specifically, we obtain a running time of O(|T| |T\u27|Delta) for two rooted or unrooted trees T and T\u27, where Delta=min{Delta(T),Delta(T\u27)} with Delta(X) the maximum degree of X. If the weights are integral and at most C, we obtain a running time of O(|T| |T\u27|sqrt Delta log (C min{|T|,|T\u27|})) for rooted trees
EmbAssi: Embedding Assignment Costs for Similarity Search in Large Graph Databases
The graph edit distance is an intuitive measure to quantify the dissimilarity
of graphs, but its computation is NP-hard and challenging in practice. We
introduce methods for answering nearest neighbor and range queries regarding
this distance efficiently for large databases with up to millions of graphs. We
build on the filter-verification paradigm, where lower and upper bounds are
used to reduce the number of exact computations of the graph edit distance.
Highly effective bounds for this involve solving a linear assignment problem
for each graph in the database, which is prohibitive in massive datasets.
Index-based approaches typically provide only weak bounds leading to high
computational costs verification. In this work, we derive novel lower bounds
for efficient filtering from restricted assignment problems, where the cost
function is a tree metric. This special case allows embedding the costs of
optimal assignments isometrically into space, rendering efficient
indexing possible. We propose several lower bounds of the graph edit distance
obtained from tree metrics reflecting the edit costs, which are combined for
effective filtering. Our method termed EmbAssi can be integrated into existing
filter-verification pipelines as a fast and effective pre-filtering step.
Empirically we show that for many real-world graphs our lower bounds are
already close to the exact graph edit distance, while our index construction
and search scales to very large databases
Approximating the Graph Edit Distance with Compact Neighborhood Representations
The graph edit distance is used for comparing graphs in various domains. Due
to its high computational complexity it is primarily approximated. Widely-used
heuristics search for an optimal assignment of vertices based on the distance
between local substructures. While faster ones only consider vertices and their
incident edges, leading to poor accuracy, other approaches require
computationally intense exact distance computations between subgraphs. Our new
method abstracts local substructures to neighborhood trees and compares them
using efficient tree matching techniques. This results in a ground distance for
mapping vertices that yields high quality approximations of the graph edit
distance. By limiting the maximum tree height, our method supports steering
between more accurate results and faster execution. We thoroughly analyze the
running time of the tree matching method and propose several techniques to
accelerate computation in practice. We use compressed tree representations,
recognize redundancies by tree canonization and exploit them via caching.
Experimentally we show that our method provides a significantly improved
trade-off between running time and approximation quality compared to existing
state-of-the-art approaches