30 research outputs found
Experimental Evaluation of Subgraph Isomorphism Solvers
International audienceSubgraph Isomorphism (SI) is an NP-complete problem which is at the heart of many structural pattern recognition tasks as it involves finding a copy of a pattern graph into a target graph. In the pattern recognition community, the most well-known SI solvers are VF2, VF3, and RI. SI is also widely studied in the constraint programming community, and many constraint-based SI solvers have been proposed since Ullman, such as LAD and Glasgow, for example. All these SI solvers can solve very quickly some large SI instances, that involve graphs with thousands of nodes. However, McCreesh et al. have recently shown how to randomly generate SI instances the hardness of which can be controlled and predicted, and they have built small instances which are computationally challenging for all solvers. They have also shown that some small instances, which are predicted to be easy and are easily solved by constraint-based solvers, appear to be challenging for VF2 and VF3. In this paper, we widen this study by considering a large test suite coming from eight benchmarks. We show that, as expected for an NP-complete problem, the solving time of an instance does not depend on its size, and that some small instances coming from real applications are not solved by any of the considered solvers. We also show that, if RI and VF3 can solve very quickly a large number of easy instances, for which Glasgow or LAD need more time, they fail at solving some other instances that are quickly solved by Glasgow or LAD, and they are clearly outperformed by Glasgow on hard instances. Finally, we show that we can easily combine solvers to take benefit of their complementarity
Simple Pattern-only Heuristics Lead To Fast Subgraph Matching Strategies on Very Large Networks
A wide range of biomedical applications entails solving the subgraph isomorphism problem, i.e. nding all the possible subgraphs of a target graph that are structurally equivalent to an input pattern graph. Targets may be very large and complex structures compared to patterns. Methods that address this NP-complete problem use heuristics. Their performance in both time and quality depends on a few subtleties of those heuristics. This paper compares the performance of state-of-theart algorithms for subgraph isomorphism on small, medium and very large graphs. Results show that heuristics based on pattern graphs alone prove to be the most ecient, an unexpected result
AEDNet: Adaptive Edge-Deleting Network For Subgraph Matching
Subgraph matching is to find all subgraphs in a data graph that are
isomorphic to an existing query graph. Subgraph matching is an NP-hard problem,
yet has found its applications in many areas. Many learning-based methods have
been proposed for graph matching, whereas few have been designed for subgraph
matching. The subgraph matching problem is generally more challenging, mainly
due to the different sizes between the two graphs, resulting in considerable
large space of solutions. Also the extra edges existing in the data graph
connecting to the matched nodes may lead to two matched nodes of two graphs
having different adjacency structures and often being identified as distinct
objects. Due to the extra edges, the existing learning based methods often fail
to generate sufficiently similar node-level embeddings for matched nodes. This
study proposes a novel Adaptive Edge-Deleting Network (AEDNet) for subgraph
matching. The proposed method is trained in an end-to-end fashion. In AEDNet, a
novel sample-wise adaptive edge-deleting mechanism removes extra edges to
ensure consistency of adjacency structure of matched nodes, while a
unidirectional cross-propagation mechanism ensures consistency of features of
matched nodes. We applied the proposed method on six datasets with graph sizes
varying from 20 to 2300. Our evaluations on six open datasets demonstrate that
the proposed AEDNet outperforms six state-of-the-arts and is much faster than
the exact methods on large graphs
GraphMineSuite: Enabling High-Performance and Programmable Graph Mining Algorithms with Set Algebra
We propose GraphMineSuite (GMS): the first benchmarking suite for graph
mining that facilitates evaluating and constructing high-performance graph
mining algorithms. First, GMS comes with a benchmark specification based on
extensive literature review, prescribing representative problems, algorithms,
and datasets. Second, GMS offers a carefully designed software platform for
seamless testing of different fine-grained elements of graph mining algorithms,
such as graph representations or algorithm subroutines. The platform includes
parallel implementations of more than 40 considered baselines, and it
facilitates developing complex and fast mining algorithms. High modularity is
possible by harnessing set algebra operations such as set intersection and
difference, which enables breaking complex graph mining algorithms into simple
building blocks that can be separately experimented with. GMS is supported with
a broad concurrency analysis for portability in performance insights, and a
novel performance metric to assess the throughput of graph mining algorithms,
enabling more insightful evaluation. As use cases, we harness GMS to rapidly
redesign and accelerate state-of-the-art baselines of core graph mining
problems: degeneracy reordering (by up to >2x), maximal clique listing (by up
to >9x), k-clique listing (by 1.1x), and subgraph isomorphism (by up to 2.5x),
also obtaining better theoretical performance bounds
GSI: GPU-friendly Subgraph Isomorphism
Subgraph isomorphism is a well-known NP-hard problem that is widely used in
many applications, such as social network analysis and query over the knowledge
graph. Due to the inherent hardness, its performance is often a bottleneck in
various real-world applications. Therefore, we address this by designing an
efficient subgraph isomorphism algorithm leveraging features of GPU
architecture, such as massive parallelism and memory hierarchy. Existing
GPU-based solutions adopt a two-step output scheme, performing the same join
process twice in order to write intermediate results concurrently. They also
lack GPU architecture-aware optimizations that allow scaling to large graphs.
In this paper, we propose a GPU-friendly subgraph isomorphism algorithm, GSI.
Different from existing edge join-based GPU solutions, we propose a
Prealloc-Combine strategy based on the vertex-oriented framework, which avoids
joining-twice in existing solutions. Also, a GPU-friendly data structure
(called PCSR) is proposed to represent an edge-labeled graph. Extensive
experiments on both synthetic and real graphs show that GSI outperforms the
state-of-the-art algorithms by up to several orders of magnitude and has good
scalability with graph size scaling to hundreds of millions of edges.Comment: 15 pages, 17 figures, conferenc
Partitioning algorithms for induced subgraph problems
This dissertation introduces the MCSPLIT family of algorithms for two closely-related NP-hard problems that involve finding a large induced subgraph contained by each of two input graphs: the induced subgraph isomorphism problem and the maximum common induced subgraph problem.
The MCSPLIT algorithms resemble forward-checking constrant programming algorithms, but use problem-specific data structures that allow multiple, identical domains to be stored without duplication. These data structures enable fast, simple constraint propagation algorithms and very fast calculation of upper bounds. Versions of these algorithms for both sparse and dense graphs are described and implemented. The resulting algorithms are over an order of magnitude faster than the best existing algorithm for maximum common induced subgraph on unlabelled graphs, and outperform the state of the art on several classes of induced subgraph isomorphism instances.
A further advantage of the MCSPLIT data structures is that variables and values are treated identically; this allows us to choose to branch on variables representing vertices of either input graph with no overhead. An extensive set of experiments shows that such two-sided branching can be particularly beneficial if the two input graphs have very different orders or densities. Finally, we turn from subgraphs to supergraphs, tackling the problem of finding a small graph that contains every member of a given family of graphs as an induced subgraph. Exact and heuristic techniques are developed for this problem, in each case using a MCSPLIT algorithm as a subroutine. These algorithms allow us to add new terms to two entries of the On-Line Encyclopedia of Integer Sequences
Learning-Based Approaches for Graph Problems: A Survey
Over the years, many graph problems specifically those in NP-complete are
studied by a wide range of researchers. Some famous examples include graph
colouring, travelling salesman problem and subgraph isomorphism. Most of these
problems are typically addressed by exact algorithms, approximate algorithms
and heuristics. There are however some drawback for each of these methods.
Recent studies have employed learning-based frameworks such as machine learning
techniques in solving these problems, given that they are useful in discovering
new patterns in structured data that can be represented using graphs. This
research direction has successfully attracted a considerable amount of
attention. In this survey, we provide a systematic review mainly on classic
graph problems in which learning-based approaches have been proposed in
addressing the problems. We discuss the overview of each framework, and provide
analyses based on the design and performance of the framework. Some potential
research questions are also suggested. Ultimately, this survey gives a clearer
insight and can be used as a stepping stone to the research community in
studying problems in this field.Comment: v1: 41 pages; v2: 40 page
Matched Filters for Noisy Induced Subgraph Detection
The problem of finding the vertex correspondence between two noisy graphs
with different number of vertices where the smaller graph is still large has
many applications in social networks, neuroscience, and computer vision. We
propose a solution to this problem via a graph matching matched filter:
centering and padding the smaller adjacency matrix and applying graph matching
methods to align it to the larger network. The centering and padding schemes
can be incorporated into any algorithm that matches using adjacency matrices.
Under a statistical model for correlated pairs of graphs, which yields a noisy
copy of the small graph within the larger graph, the resulting optimization
problem can be guaranteed to recover the true vertex correspondence between the
networks.
However, there are currently no efficient algorithms for solving this
problem. To illustrate the possibilities and challenges of such problems, we
use an algorithm that can exploit a partially known correspondence and show via
varied simulations and applications to {\it Drosophila} and human connectomes
that this approach can achieve good performance.Comment: 41 pages, 7 figure