4,149 research outputs found
Theoretically Efficient Parallel Graph Algorithms Can Be Fast and Scalable
There has been significant recent interest in parallel graph processing due
to the need to quickly analyze the large graphs available today. Many graph
codes have been designed for distributed memory or external memory. However,
today even the largest publicly-available real-world graph (the Hyperlink Web
graph with over 3.5 billion vertices and 128 billion edges) can fit in the
memory of a single commodity multicore server. Nevertheless, most experimental
work in the literature report results on much smaller graphs, and the ones for
the Hyperlink graph use distributed or external memory. Therefore, it is
natural to ask whether we can efficiently solve a broad class of graph problems
on this graph in memory.
This paper shows that theoretically-efficient parallel graph algorithms can
scale to the largest publicly-available graphs using a single machine with a
terabyte of RAM, processing them in minutes. We give implementations of
theoretically-efficient parallel algorithms for 20 important graph problems. We
also present the optimizations and techniques that we used in our
implementations, which were crucial in enabling us to process these large
graphs quickly. We show that the running times of our implementations
outperform existing state-of-the-art implementations on the largest real-world
graphs. For many of the problems that we consider, this is the first time they
have been solved on graphs at this scale. We have made the implementations
developed in this work publicly-available as the Graph-Based Benchmark Suite
(GBBS).Comment: This is the full version of the paper appearing in the ACM Symposium
on Parallelism in Algorithms and Architectures (SPAA), 201
Locally Estimating Core Numbers
Graphs are a powerful way to model interactions and relationships in data
from a wide variety of application domains. In this setting, entities
represented by vertices at the "center" of the graph are often more important
than those associated with vertices on the "fringes". For example, central
nodes tend to be more critical in the spread of information or disease and play
an important role in clustering/community formation. Identifying such "core"
vertices has recently received additional attention in the context of {\em
network experiments}, which analyze the response when a random subset of
vertices are exposed to a treatment (e.g. inoculation, free product samples,
etc). Specifically, the likelihood of having many central vertices in any
exposure subset can have a significant impact on the experiment.
We focus on using -cores and core numbers to measure the extent to which a
vertex is central in a graph. Existing algorithms for computing the core number
of a vertex require the entire graph as input, an unrealistic scenario in many
real world applications. Moreover, in the context of network experiments, the
subgraph induced by the treated vertices is only known in a probabilistic
sense. We introduce a new method for estimating the core number based only on
the properties of the graph within a region of radius around the
vertex, and prove an asymptotic error bound of our estimator on random graphs.
Further, we empirically validate the accuracy of our estimator for small values
of on a representative corpus of real data sets. Finally, we evaluate
the impact of improved local estimation on an open problem in network
experimentation posed by Ugander et al.Comment: Main paper body is identical to previous version (ICDM version).
Appendix with additional data sets and enlarged figures has been added to the
en
Parallel Algorithms for Geometric Graph Problems
We give algorithms for geometric graph problems in the modern parallel models
inspired by MapReduce. For example, for the Minimum Spanning Tree (MST) problem
over a set of points in the two-dimensional space, our algorithm computes a
-approximate MST. Our algorithms work in a constant number of
rounds of communication, while using total space and communication proportional
to the size of the data (linear space and near linear time algorithms). In
contrast, for general graphs, achieving the same result for MST (or even
connectivity) remains a challenging open problem, despite drawing significant
attention in recent years.
We develop a general algorithmic framework that, besides MST, also applies to
Earth-Mover Distance (EMD) and the transportation cost problem. Our algorithmic
framework has implications beyond the MapReduce model. For example it yields a
new algorithm for computing EMD cost in the plane in near-linear time,
. We note that while recently Sharathkumar and Agarwal
developed a near-linear time algorithm for -approximating EMD,
our algorithm is fundamentally different, and, for example, also solves the
transportation (cost) problem, raised as an open question in their work.
Furthermore, our algorithm immediately gives a -approximation
algorithm with space in the streaming-with-sorting model with
passes. As such, it is tempting to conjecture that the
parallel models may also constitute a concrete playground in the quest for
efficient algorithms for EMD (and other similar problems) in the vanilla
streaming model, a well-known open problem
- …