3 research outputs found
Graph Pattern Mining and Learning through User-defined Relations (Extended Version)
In this work we propose R-GPM, a parallel computing framework for graph
pattern mining (GPM) through a user-defined subgraph relation. More
specifically, we enable the computation of statistics of patterns through their
subgraph classes, generalizing traditional GPM methods. R-GPM provides
efficient estimators for these statistics by employing a MCMC sampling
algorithm combined with several optimizations. We provide both theoretical
guarantees and empirical evaluations of our estimators in application scenarios
such as stochastic optimization of deep high-order graph neural network models
and pattern (motif) counting. We also propose and evaluate optimizations that
enable improvements of our estimators accuracy, while reducing their
computational costs in up to 3-orders-of-magnitude. Finally,we show that R-GPM
is scalable, providing near-linear speedups on 44 cores in all of our tests.Comment: Extended version of the paper published in the ICDM 201
Neural Subgraph Isomorphism Counting
In this paper, we study a new graph learning problem: learning to count
subgraph isomorphisms. Different from other traditional graph learning problems
such as node classification and link prediction, subgraph isomorphism counting
is NP-complete and requires more global inference to oversee the whole graph.
To make it scalable for large-scale graphs and patterns, we propose a learning
framework which augments different representation learning architectures and
iteratively attends pattern and target data graphs to memorize subgraph
isomorphisms for the global counting. We develop both small graphs (<= 1,024
subgraph isomorphisms in each) and large graphs (<= 4,096 subgraph isomorphisms
in each) sets to evaluate different models. A mutagenic compound dataset,
MUTAG, is also used to evaluate neural models and demonstrate the success of
transfer learning. While the learning based approach is inexact, we are able to
generalize to count large patterns and data graphs in linear time compared to
the exponential time of the original NP-complete problem. Experimental results
show that learning based subgraph isomorphism counting can speed up the
traditional algorithm, VF2, 10-1,000 times with acceptable errors. Domain
adaptation based on fine-tuning also shows the usefulness of our approach in
real-world applications.Comment: Accepted by KDD 202
Sequential Stratified Regeneration: MCMC for Large State Spaces with an Application to Subgraph Count Estimation
This work considers the general task of estimating the sum of a bounded
function over the edges of a graph, given neighborhood query access and where
access to the entire network is prohibitively expensive. To estimate this sum,
prior work proposes Markov chain Monte Carlo (MCMC) methods that use random
walks started at some seed vertex and whose equilibrium distribution is the
uniform distribution over all edges, eliminating the need to iterate over all
edges. Unfortunately, these existing estimators are not scalable to massive
real-world graphs. In this paper, we introduce Ripple, an MCMC-based estimator
that achieves unprecedented scalability by stratifying the Markov chain state
space into ordered strata with a new technique that we denote {\em sequential
stratified regenerations}. We show that the Ripple estimator is consistent,
highly parallelizable, and scales well.
We empirically evaluate our method by applying Ripple to the task of
estimating connected, induced subgraph counts given some input graph. Therein,
we demonstrate that Ripple is accurate and can estimate counts of up to
-node subgraphs, which is a task at a scale that has been considered
unreachable, not only by prior MCMC-based methods but also by other sampling
approaches. For instance, in this target application, we present results in
which the Markov chain state space is as large as , for which Ripple
computes estimates in less than hours, on average.Comment: Markov Chain Monte Carlo, Random Walk, Regenerative Sampling, Motif
Analysis, Subgraph Counting, Graph Minin