7,686 research outputs found
Parallel Algorithms for Small Subgraph Counting
Subgraph counting is a fundamental problem in analyzing massive graphs, often
studied in the context of social and complex networks. There is a rich
literature on designing efficient, accurate, and scalable algorithms for this
problem. In this work, we tackle this challenge and design several new
algorithms for subgraph counting in the Massively Parallel Computation (MPC)
model:
Given a graph over vertices, edges and triangles, our first
main result is an algorithm that, with high probability, outputs a
-approximation to , with optimal round and space complexity
provided any space per machine, assuming
.
Our second main result is an -rounds
algorithm for exactly counting the number of triangles, parametrized by the
arboricity of the input graph. The space per machine is
for any constant , and the total space is ,
which matches the time complexity of (combinatorial) triangle counting in the
sequential model. We also prove that this result can be extended to exactly
counting -cliques for any constant , with the same round complexity and
total space . Alternatively, allowing space per
machine, the total space requirement reduces to .
Finally, we prove that a recent result of Bera, Pashanasangi and Seshadhri
(ITCS 2020) for exactly counting all subgraphs of size at most , can be
implemented in the MPC model in rounds,
space per machine and total space. Therefore,
this result also exhibits the phenomenon that a time bound in the sequential
model translates to a space bound in the MPC model
Parallel Five-Cycle Counting Algorithms
Counting the frequency of subgraphs in large networks is a classic research question that reveals the underlying substructures of these networks for important applications. However, subgraph counting is a challenging problem, even for subgraph sizes as small as five, due to the combinatorial explosion in the number of possible occurrences. This paper focuses on the five-cycle, which is an important special case of five-vertex subgraph counting and one of the most difficult to count efficiently.
We design two new parallel five-cycle counting algorithms and prove that they are work-efficient and achieve polylogarithmic span. Both algorithms are based on computing low out-degree orientations, which enables the efficient computation of directed two-paths and three-paths, and the algorithms differ in the ways in which they use this orientation to eliminate double-counting. We develop fast multicore implementations of the algorithms and propose a work scheduling optimization to improve their performance. Our experiments on a variety of real-world graphs using a 36-core machine with two-way hyper-threading show that our algorithms achieves 10-46x self-relative speed-up, outperform our serial benchmarks by 10-32x, and outperform the previous state-of-the-art serial algorithm by up to 818x
Beyond Triangles: A Distributed Framework for Estimating 3-profiles of Large Graphs
We study the problem of approximating the -profile of a large graph.
-profiles are generalizations of triangle counts that specify the number of
times a small graph appears as an induced subgraph of a large graph. Our
algorithm uses the novel concept of -profile sparsifiers: sparse graphs that
can be used to approximate the full -profile counts for a given large graph.
Further, we study the problem of estimating local and ego -profiles, two
graph quantities that characterize the local neighborhood of each vertex of a
graph.
Our algorithm is distributed and operates as a vertex program over the
GraphLab PowerGraph framework. We introduce the concept of edge pivoting which
allows us to collect -hop information without maintaining an explicit
-hop neighborhood list at each vertex. This enables the computation of all
the local -profiles in parallel with minimal communication.
We test out implementation in several experiments scaling up to cores
on Amazon EC2. We find that our algorithm can estimate the -profile of a
graph in approximately the same time as triangle counting. For the harder
problem of ego -profiles, we introduce an algorithm that can estimate
profiles of hundreds of thousands of vertices in parallel, in the timescale of
minutes.Comment: To appear in part at KDD'1
Approximately Counting Embeddings into Random Graphs
Let H be a graph, and let C_H(G) be the number of (subgraph isomorphic)
copies of H contained in a graph G. We investigate the fundamental problem of
estimating C_H(G). Previous results cover only a few specific instances of this
general problem, for example, the case when H has degree at most one
(monomer-dimer problem). In this paper, we present the first general subcase of
the subgraph isomorphism counting problem which is almost always efficiently
approximable. The results rely on a new graph decomposition technique.
Informally, the decomposition is a labeling of the vertices such that every
edge is between vertices with different labels and for every vertex all
neighbors with a higher label have identical labels. The labeling implicitly
generates a sequence of bipartite graphs which permits us to break the problem
of counting embeddings of large subgraphs into that of counting embeddings of
small subgraphs. Using this method, we present a simple randomized algorithm
for the counting problem. For all decomposable graphs H and all graphs G, the
algorithm is an unbiased estimator. Furthermore, for all graphs H having a
decomposition where each of the bipartite graphs generated is small and almost
all graphs G, the algorithm is a fully polynomial randomized approximation
scheme.
We show that the graph classes of H for which we obtain a fully polynomial
randomized approximation scheme for almost all G includes graphs of degree at
most two, bounded-degree forests, bounded-length grid graphs, subdivision of
bounded-degree graphs, and major subclasses of outerplanar graphs,
series-parallel graphs and planar graphs, whereas unbounded-length grid graphs
are excluded.Comment: Earlier version appeared in Random 2008. Fixed an typo in Definition
3.
Shared-memory Graph Truss Decomposition
We present PKT, a new shared-memory parallel algorithm and OpenMP
implementation for the truss decomposition of large sparse graphs. A k-truss is
a dense subgraph definition that can be considered a relaxation of a clique.
Truss decomposition refers to a partitioning of all the edges in the graph
based on their k-truss membership. The truss decomposition of a graph has many
applications. We show that our new approach PKT consistently outperforms other
truss decomposition approaches for a collection of large sparse graphs and on a
24-core shared-memory server. PKT is based on a recently proposed algorithm for
k-core decomposition.Comment: 10 pages, conference submissio
- …