138,086 research outputs found
Fast Computation of Small Cuts via Cycle Space Sampling
We describe a new sampling-based method to determine cuts in an undirected
graph. For a graph (V, E), its cycle space is the family of all subsets of E
that have even degree at each vertex. We prove that with high probability,
sampling the cycle space identifies the cuts of a graph. This leads to simple
new linear-time sequential algorithms for finding all cut edges and cut pairs
(a set of 2 edges that form a cut) of a graph.
In the model of distributed computing in a graph G=(V, E) with O(log V)-bit
messages, our approach yields faster algorithms for several problems. The
diameter of G is denoted by Diam, and the maximum degree by Delta. We obtain
simple O(Diam)-time distributed algorithms to find all cut edges,
2-edge-connected components, and cut pairs, matching or improving upon previous
time bounds. Under natural conditions these new algorithms are universally
optimal --- i.e. a Omega(Diam)-time lower bound holds on every graph. We obtain
a O(Diam+Delta/log V)-time distributed algorithm for finding cut vertices; this
is faster than the best previous algorithm when Delta, Diam = O(sqrt(V)). A
simple extension of our work yields the first distributed algorithm with
sub-linear time for 3-edge-connected components. The basic distributed
algorithms are Monte Carlo, but they can be made Las Vegas without increasing
the asymptotic complexity.
In the model of parallel computing on the EREW PRAM our approach yields a
simple algorithm with optimal time complexity O(log V) for finding cut pairs
and 3-edge-connected components.Comment: Previous version appeared in Proc. 35th ICALP, pages 145--160, 200
Some Optimally Adaptive Parallel Graph Algorithms on EREW PRAM Model
The study of graph algorithms is an important area of research in computer science, since graphs offer useful tools to model many real-world situations. The commercial availability of parallel computers have led to the development of efficient parallel graph algorithms.
Using an exclusive-read and exclusive-write (EREW) parallel random access machine (PRAM) as the computation model with a fixed number of processors, we design and analyze parallel algorithms for seven undirected graph problems, such as, connected components, spanning forest, fundamental cycle set, bridges, bipartiteness, assignment problems, and approximate vertex coloring. For all but the last two problems, the input data structure is an unordered list of edges, and divide-and-conquer is the paradigm for designing algorithms. One of the algorithms to solve the assignment problem makes use of an appropriate variant of dynamic programming strategy. An elegant data structure, called the adjacency list matrix, used in a vertex-coloring algorithm avoids the sequential nature of linked adjacency lists.
Each of the proposed algorithms achieves optimal speedup, choosing an optimal granularity (thus exploiting maximum parallelism) which depends on the density or the number of vertices of the given graph. The processor-(time)2 product has been identified as a useful parameter to measure the cost-effectiveness of a parallel algorithm. We derive a lower bound on this measure for each of our algorithms
Recommended from our members
Parallel algorithms for finding connected components using linear algebra
Finding connected components is one of the most widely used operations on a graph. Optimal serial algorithms for the problem have been known for half a century, and many competing parallel algorithms have been proposed over the last several decades under various different models of parallel computation. This paper presents a class of parallel connected-component algorithms designed using linear-algebraic primitives. These algorithms are based on a PRAM algorithm by Shiloach and Vishkin and can be designed using standard GraphBLAS operations. We demonstrate two algorithms of this class, one named LACC for Linear Algebraic Connected Components, and the other named FastSV which can be regarded as LACC's simplification. With the support of the highly-scalable Combinatorial BLAS library, LACC and FastSV outperform the previous state-of-the-art algorithm by a factor of up to 12x for small to medium scale graphs. For large graphs with more than 50B edges, LACC and FastSV scale to 4K nodes (262K cores) of a Cray XC40 supercomputer and outperform previous algorithms by a significant margin. This remarkable performance is accomplished by (1) exploiting sparsity that was not present in the original PRAM algorithm formulation, (2) using high-performance primitives of Combinatorial BLAS, and (3) identifying hot spots and optimizing them away by exploiting algorithmic insights
ConnectIt: A Framework for Static and Incremental Parallel Graph Connectivity Algorithms
Connected components is a fundamental kernel in graph applications due to its
usefulness in measuring how well-connected a graph is, as well as its use as
subroutines in many other graph algorithms. The fastest existing parallel
multicore algorithms for connectivity are based on some form of edge sampling
and/or linking and compressing trees. However, many combinations of these
design choices have been left unexplored. In this paper, we design the
ConnectIt framework, which provides different sampling strategies as well as
various tree linking and compression schemes. ConnectIt enables us to obtain
several hundred new variants of connectivity algorithms, most of which extend
to computing spanning forest. In addition to static graphs, we also extend
ConnectIt to support mixes of insertions and connectivity queries in the
concurrent setting.
We present an experimental evaluation of ConnectIt on a 72-core machine,
which we believe is the most comprehensive evaluation of parallel connectivity
algorithms to date. Compared to a collection of state-of-the-art static
multicore algorithms, we obtain an average speedup of 37.4x (2.36x average
speedup over the fastest existing implementation for each graph). Using
ConnectIt, we are able to compute connectivity on the largest
publicly-available graph (with over 3.5 billion vertices and 128 billion edges)
in under 10 seconds using a 72-core machine, providing a 3.1x speedup over the
fastest existing connectivity result for this graph, in any computational
setting. For our incremental algorithms, we show that our algorithms can ingest
graph updates at up to several billion edges per second. Finally, to guide the
user in selecting the best variants in ConnectIt for different situations, we
provide a detailed analysis of the different strategies in terms of their work
and locality
Optimal Deterministic Massively Parallel Connectivity on Forests
We show fast deterministic algorithms for fundamental problems on forests in
the challenging low-space regime of the well-known Massive Parallel Computation
(MPC) model. A recent breakthrough result by Coy and Czumaj [STOC'22] shows
that, in this setting, it is possible to deterministically identify connected
components on graphs in rounds, where is the
diameter of the graph and the number of nodes. The authors left open a
major question: is it possible to get rid of the additive factor
and deterministically identify connected components in a runtime that is
completely independent of ?
We answer the above question in the affirmative in the case of forests. We
give an algorithm that identifies connected components in
deterministic rounds. The total memory required is words, where is
the number of edges in the input graph, which is optimal as it is only enough
to store the input graph. We complement our upper bound results by showing that
time is necessary even for component-unstable algorithms,
conditioned on the widely believed 1 vs. 2 cycles conjecture. Our techniques
also yield a deterministic forest-rooting algorithm with the same runtime and
memory bounds.
Furthermore, we consider Locally Checkable Labeling problems (LCLs), whose
solution can be verified by checking the -radius neighborhood of each
node. We show that any LCL problem on forests can be solved in
rounds with a canonical deterministic algorithm, improving over the
runtime of Brandt, Latypov and Uitto [DISC'21]. We also show that there is no
algorithm that solves all LCL problems on trees asymptotically faster.Comment: ACM-SIAM Symposium on Discrete Algorithms (SODA) 202
Meerkat: A framework for Dynamic Graph Algorithms on GPUs
Graph algorithms are challenging to implement due to their varying topology
and irregular access patterns. Real-world graphs are dynamic in nature and
routinely undergo edge and vertex additions, as well as, deletions. Typical
examples of dynamic graphs are social networks, collaboration networks, and
road networks. Applying static algorithms repeatedly on dynamic graphs is
inefficient. Unfortunately, we know little about how to efficiently process
dynamic graphs on massively parallel architectures such as GPUs. Existing
approaches to represent and process dynamic graphs are either not general or
inefficient. In this work, we propose a library-based framework for dynamic
graph algorithms that proposes a GPU-tailored graph representation and exploits
the warp-cooperative execution model. The library, named Meerkat, builds upon a
recently proposed dynamic graph representation on GPUs. This representation
exploits a hashtable-based mechanism to store a vertex's neighborhood. Meerkat
also enables fast iteration through a group of vertices, such as the whole set
of vertices or the neighbors of a vertex. Based on the efficient iterative
patterns encoded in Meerkat, we implement dynamic versions of the popular graph
algorithms such as breadth-first search, single-source shortest paths, triangle
counting, weakly connected components, and PageRank. Compared to the
state-of-the-art dynamic graph analytics framework Hornet, Meerkat is
, , and faster, for query, insert, and
delete operations, respectively. Using a variety of real-world graphs, we
observe that Meerkat significantly improves the efficiency of the underlying
dynamic graph algorithm. Meerkat performs for BFS,
for SSSP, for PageRank, and for WCC, better than
Hornet on average
Recommended from our members
New Primitives for Tackling Graph Problems and Their Applications in Parallel Computing
We study fundamental graph problems under parallel computing models. In particular, we consider two parallel computing models: Parallel Random Access Machine (PRAM) and Massively Parallel Computation (MPC). The PRAM model is a classic model of parallel computation. The efficiency of a PRAM algorithm is measured by its parallel time and the number of processors needed to achieve the parallel time. The MPC model is an abstraction of modern massive parallel computing systems such as MapReduce, Hadoop and Spark. The MPC model captures well coarse-grained computation on large data --- data is distributed to processors, each of which has a sublinear (in the input data) amount of local memory and we alternate between rounds of computation and rounds of communication, where each machine can communicate an amount of data as large as the size of its memory. We usually desire fully scalable MPC algorithms, i.e., algorithms that can work for any local memory size. The efficiency of a fully scalable MPC algorithm is measured by its parallel time and the total space usage (the local memory size times the number of machines).
Consider an -vertex -edge undirected graph (either weighted or unweighted) with diameter (the largest diameter of its connected components). Let =+ denote the size of . We present a series of efficient (randomized) parallel graph algorithms with theoretical guarantees. Several results are listed as follows:
1) Fully scalable MPC algorithms for graph connectivity and spanning forest using () total space and (log loglog_{/} ) parallel time.
2) Fully scalable MPC algorithms for 2-edge and 2-vertex connectivity using () total space where 2-edge connectivity algorithm needs (log loglog_{/} ) parallel time, and 2-vertex connectivity algorithm needs (log ⸱log²log_{/} n+\log D'⸱loglog_{/} ) parallel time. Here ' denotes the bi-diameter of .
3) PRAM algorithms for graph connectivity and spanning forest using () processors and (log loglog_{/} ) parallel time.
4) PRAM algorithms for (1 + )-approximate shortest path and (1 + )-approximate uncapacitated minimum cost flow using () processors and poly(log ) parallel time.
These algorithms are built on a series of new graph algorithmic primitives which may be of independent interests
- …