14 research outputs found
Distributed Estimation of Graph 4-Profiles
We present a novel distributed algorithm for counting all four-node induced
subgraphs in a big graph. These counts, called the -profile, describe a
graph's connectivity properties and have found several uses ranging from
bioinformatics to spam detection. We also study the more complicated problem of
estimating the local -profiles centered at each vertex of the graph. The
local -profile embeds every vertex in an -dimensional space that
characterizes the local geometry of its neighborhood: vertices that connect
different clusters will have different local -profiles compared to those
that are only part of one dense cluster.
Our algorithm is a local, distributed message-passing scheme on the graph and
computes all the local -profiles in parallel. We rely on two novel
theoretical contributions: we show that local -profiles can be calculated
using compressed two-hop information and also establish novel concentration
results that show that graphs can be substantially sparsified and still retain
good approximation quality for the global -profile.
We empirically evaluate our algorithm using a distributed GraphLab
implementation that we scaled up to cores. We show that our algorithm can
compute global and local -profiles of graphs with millions of edges in a few
minutes, significantly improving upon the previous state of the art.Comment: To appear in part at WWW'1
A Fast Counting Method for 6-motifs with Low Connectivity
A -motif (or graphlet) is a subgraph on nodes in a graph or network.
Counting of motifs in complex networks has been a well-studied problem in
network analysis of various real-word graphs arising from the study of social
networks and bioinformatics. In particular, the triangle counting problem has
received much attention due to its significance in understanding the behavior
of social networks. Similarly, subgraphs with more than 3 nodes have received
much attention recently. While there have been successful methods developed on
this problem, most of the existing algorithms are not scalable to large
networks with millions of nodes and edges.
The main contribution of this paper is a preliminary study that genaralizes
the exact counting algorithm provided by Pinar, Seshadhri and Vishal to a
collection of 6-motifs. This method uses the counts of motifs with smaller size
to obtain the counts of 6-motifs with low connecivity, that is, containing a
cut-vertex or a cut-edge. Therefore, it circumvents the combinatorial explosion
that naturally arises when counting subgraphs in large networks
The Sketching Complexity of Graph and Hypergraph Counting
Subgraph counting is a fundamental primitive in graph processing, with
applications in social network analysis (e.g., estimating the clustering
coefficient of a graph), database processing and other areas. The space
complexity of subgraph counting has been studied extensively in the literature,
but many natural settings are still not well understood. In this paper we
revisit the subgraph (and hypergraph) counting problem in the sketching model,
where the algorithm's state as it processes a stream of updates to the graph is
a linear function of the stream. This model has recently received a lot of
attention in the literature, and has become a standard model for solving
dynamic graph streaming problems.
In this paper we give a tight bound on the sketching complexity of counting
the number of occurrences of a small subgraph in a bounded degree graph
presented as a stream of edge updates. Specifically, we show that the space
complexity of the problem is governed by the fractional vertex cover number of
the graph . Our subgraph counting algorithm implements a natural vertex
sampling approach, with sampling probabilities governed by the vertex cover of
. Our main technical contribution lies in a new set of Fourier analytic
tools that we develop to analyze multiplayer communication protocols in the
simultaneous communication model, allowing us to prove a tight lower bound. We
believe that our techniques are likely to find applications in other settings.
Besides giving tight bounds for all graphs , both our algorithm and lower
bounds extend to the hypergraph setting, albeit with some loss in space
complexity
On Counting (Quantum-)Graph Homomorphisms in Finite Fields of Prime Order
We study the problem of counting the number of homomorphisms from an input
graph to a fixed (quantum) graph in any finite field of prime
order . The subproblem with graph was introduced by Faben and
Jerrum~[ToC'15] and its complexity is still uncharacterised despite active
research, e.g. the very recent work of Focke, Goldberg, Roth, and
Zivn\'y~[SODA'21]. Our contribution is threefold. First, we introduce the study
of quantum graphs to the study of modular counting homomorphisms. We show that
the complexity for a quantum graph collapses to the complexity
criteria found at dimension 1: graphs. Second, in order to prove cases of
intractability we establish a further reduction to the study of bipartite
graphs. Lastly, we establish a dichotomy for all bipartite
-free graphs by a thorough structural
study incorporating both local and global arguments. This result subsumes all
results on bipartite graphs known for all prime moduli and extends them
significantly. Even for the subproblem with this establishes new results.Comment: 84 pages, revised title and mainly the Introduction and the section
on partially surjective homomorphism
Parallel Five-Cycle Counting Algorithms
Counting the frequency of subgraphs in large networks is a classic research question that reveals the underlying substructures of these networks for important applications. However, subgraph counting is a challenging problem, even for subgraph sizes as small as five, due to the combinatorial explosion in the number of possible occurrences. This paper focuses on the five-cycle, which is an important special case of five-vertex subgraph counting and one of the most difficult to count efficiently.
We design two new parallel five-cycle counting algorithms and prove that they are work-efficient and achieve polylogarithmic span. Both algorithms are based on computing low out-degree orientations, which enables the efficient computation of directed two-paths and three-paths, and the algorithms differ in the ways in which they use this orientation to eliminate double-counting. We develop fast multicore implementations of the algorithms and propose a work scheduling optimization to improve their performance. Our experiments on a variety of real-world graphs using a 36-core machine with two-way hyper-threading show that our algorithms achieves 10-46x self-relative speed-up, outperform our serial benchmarks by 10-32x, and outperform the previous state-of-the-art serial algorithm by up to 818x
Provably and Efficiently Approximating Near-cliques using the Tur\'an Shadow: PEANUTS
Clique and near-clique counts are important graph properties with
applications in graph generation, graph modeling, graph analytics, community
detection among others. They are the archetypal examples of dense subgraphs.
While there are several different definitions of near-cliques, most of them
share the attribute that they are cliques that are missing a small number of
edges. Clique counting is itself considered a challenging problem. Counting
near-cliques is significantly harder more so since the search space for
near-cliques is orders of magnitude larger than that of cliques.
We give a formulation of a near-clique as a clique that is missing a constant
number of edges. We exploit the fact that a near-clique contains a smaller
clique, and use techniques for clique sampling to count near-cliques. This
method allows us to count near-cliques with 1 or 2 missing edges, in graphs
with tens of millions of edges. To the best of our knowledge, there was no
known efficient method for this problem, and we obtain a 10x - 100x speedup
over existing algorithms for counting near-cliques.
Our main technique is a space-efficient adaptation of the Tur\'an Shadow
sampling approach, recently introduced by Jain and Seshadhri (WWW 2017). This
approach constructs a large recursion tree (called the Tur\'an Shadow) that
represents cliques in a graph. We design a novel algorithm that builds an
estimator for near-cliques, using an online, compact construction of the
Tur\'an Shadow.Comment: The Web Conference, 2020 (WWW
Parallelizing Maximal Clique Enumeration on GPUs
We present a GPU solution for exact maximal clique enumeration (MCE) that
performs a search tree traversal following the Bron-Kerbosch algorithm. Prior
works on parallelizing MCE on GPUs perform a breadth-first traversal of the
tree, which has limited scalability because of the explosion in the number of
tree nodes at deep levels. We propose to parallelize MCE on GPUs by performing
depth-first traversal of independent subtrees in parallel. Since MCE suffers
from high load imbalance and memory capacity requirements, we propose a worker
list for dynamic load balancing, as well as partial induced subgraphs and a
compact representation of excluded vertex sets to regulate memory consumption.
Our evaluation shows that our GPU implementation on a single GPU outperforms
the state-of-the-art parallel CPU implementation by a geometric mean of 4.9x
(up to 16.7x), and scales efficiently to multiple GPUs. Our code has been
open-sourced to enable further research on accelerating MCE