61,289 research outputs found
Theoretically Efficient Parallel Graph Algorithms Can Be Fast and Scalable
There has been significant recent interest in parallel graph processing due
to the need to quickly analyze the large graphs available today. Many graph
codes have been designed for distributed memory or external memory. However,
today even the largest publicly-available real-world graph (the Hyperlink Web
graph with over 3.5 billion vertices and 128 billion edges) can fit in the
memory of a single commodity multicore server. Nevertheless, most experimental
work in the literature report results on much smaller graphs, and the ones for
the Hyperlink graph use distributed or external memory. Therefore, it is
natural to ask whether we can efficiently solve a broad class of graph problems
on this graph in memory.
This paper shows that theoretically-efficient parallel graph algorithms can
scale to the largest publicly-available graphs using a single machine with a
terabyte of RAM, processing them in minutes. We give implementations of
theoretically-efficient parallel algorithms for 20 important graph problems. We
also present the optimizations and techniques that we used in our
implementations, which were crucial in enabling us to process these large
graphs quickly. We show that the running times of our implementations
outperform existing state-of-the-art implementations on the largest real-world
graphs. For many of the problems that we consider, this is the first time they
have been solved on graphs at this scale. We have made the implementations
developed in this work publicly-available as the Graph-Based Benchmark Suite
(GBBS).Comment: This is the full version of the paper appearing in the ACM Symposium
on Parallelism in Algorithms and Architectures (SPAA), 201
Fast Computation of Small Cuts via Cycle Space Sampling
We describe a new sampling-based method to determine cuts in an undirected
graph. For a graph (V, E), its cycle space is the family of all subsets of E
that have even degree at each vertex. We prove that with high probability,
sampling the cycle space identifies the cuts of a graph. This leads to simple
new linear-time sequential algorithms for finding all cut edges and cut pairs
(a set of 2 edges that form a cut) of a graph.
In the model of distributed computing in a graph G=(V, E) with O(log V)-bit
messages, our approach yields faster algorithms for several problems. The
diameter of G is denoted by Diam, and the maximum degree by Delta. We obtain
simple O(Diam)-time distributed algorithms to find all cut edges,
2-edge-connected components, and cut pairs, matching or improving upon previous
time bounds. Under natural conditions these new algorithms are universally
optimal --- i.e. a Omega(Diam)-time lower bound holds on every graph. We obtain
a O(Diam+Delta/log V)-time distributed algorithm for finding cut vertices; this
is faster than the best previous algorithm when Delta, Diam = O(sqrt(V)). A
simple extension of our work yields the first distributed algorithm with
sub-linear time for 3-edge-connected components. The basic distributed
algorithms are Monte Carlo, but they can be made Las Vegas without increasing
the asymptotic complexity.
In the model of parallel computing on the EREW PRAM our approach yields a
simple algorithm with optimal time complexity O(log V) for finding cut pairs
and 3-edge-connected components.Comment: Previous version appeared in Proc. 35th ICALP, pages 145--160, 200
Near Optimal Parallel Algorithms for Dynamic DFS in Undirected Graphs
Depth first search (DFS) tree is a fundamental data structure for solving
graph problems. The classical algorithm [SiComp74] for building a DFS tree
requires time for a given graph having vertices and edges.
Recently, Baswana et al. [SODA16] presented a simple algorithm for updating DFS
tree of an undirected graph after an edge/vertex update in time.
However, their algorithm is strictly sequential. We present an algorithm
achieving similar bounds, that can be adopted easily to the parallel
environment.
In the parallel model, a DFS tree can be computed from scratch using
processors in expected time [SiComp90] on an EREW PRAM, whereas
the best deterministic algorithm takes time
[SiComp90,JAlg93] on a CRCW PRAM. Our algorithm can be used to develop optimal
(upto polylog n factors deterministic algorithms for maintaining fully dynamic
DFS and fault tolerant DFS, of an undirected graph.
1- Parallel Fully Dynamic DFS:
Given an arbitrary online sequence of vertex/edge updates, we can maintain a
DFS tree of an undirected graph in time per update using
processors on an EREW PRAM.
2- Parallel Fault tolerant DFS:
An undirected graph can be preprocessed to build a data structure of size
O(m) such that for a set of updates (where is constant) in the graph,
the updated DFS tree can be computed in time using
processors on an EREW PRAM.
Moreover, our fully dynamic DFS algorithm provides, in a seamless manner,
nearly optimal (upto polylog n factors) algorithms for maintaining a DFS tree
in semi-streaming model and a restricted distributed model. These are the first
parallel, semi-streaming and distributed algorithms for maintaining a DFS tree
in the dynamic setting.Comment: Accepted to appear in SPAA'17, 32 Pages, 5 Figure
Counting Triangles in Large Graphs on GPU
The clustering coefficient and the transitivity ratio are concepts often used
in network analysis, which creates a need for fast practical algorithms for
counting triangles in large graphs. Previous research in this area focused on
sequential algorithms, MapReduce parallelization, and fast approximations.
In this paper we propose a parallel triangle counting algorithm for CUDA GPU.
We describe the implementation details necessary to achieve high performance
and present the experimental evaluation of our approach. Our algorithm achieves
8 to 15 times speedup over the CPU implementation and is capable of finding 3.8
billion triangles in an 89 million edges graph in less than 10 seconds on the
Nvidia Tesla C2050 GPU.Comment: 2016 IEEE International Parallel and Distributed Processing Symposium
Workshops (IPDPSW
Parallel Maximum Clique Algorithms with Applications to Network Analysis and Storage
We propose a fast, parallel maximum clique algorithm for large sparse graphs
that is designed to exploit characteristics of social and information networks.
The method exhibits a roughly linear runtime scaling over real-world networks
ranging from 1000 to 100 million nodes. In a test on a social network with 1.8
billion edges, the algorithm finds the largest clique in about 20 minutes. Our
method employs a branch and bound strategy with novel and aggressive pruning
techniques. For instance, we use the core number of a vertex in combination
with a good heuristic clique finder to efficiently remove the vast majority of
the search space. In addition, we parallelize the exploration of the search
tree. During the search, processes immediately communicate changes to upper and
lower bounds on the size of maximum clique, which occasionally results in a
super-linear speedup because vertices with large search spaces can be pruned by
other processes. We apply the algorithm to two problems: to compute temporal
strong components and to compress graphs.Comment: 11 page
- …