3 research outputs found
Concurrent Disjoint Set Union
We develop and analyze concurrent algorithms for the disjoint set union
(union-find) problem in the shared memory, asynchronous multiprocessor model of
computation, with CAS (compare and swap) or DCAS (double compare and swap) as
the synchronization primitive. We give a deterministic bounded wait-free
algorithm that uses DCAS and has a total work bound of for a problem with elements and operations
solved by processes, where is a functional inverse of Ackermann's
function. We give two randomized algorithms that use only CAS and have the same
work bound in expectation. The analysis of the second randomized algorithm is
valid even if the scheduler is adversarial. Our DCAS and randomized algorithms
take steps per operation, worst-case for the DCAS algorithm,
high-probability for the randomized algorithms. Our work and step bounds grow
only logarithmically with , making our algorithms truly scalable. We prove
that for a class of symmetric algorithms that includes ours, no better step or
work bound is possible.Comment: 40 pages, combines ideas in two previous PODC paper
Exploring the Design Space of Static and Incremental Graph Connectivity Algorithms on GPUs
Connected components and spanning forest are fundamental graph algorithms due
to their use in many important applications, such as graph clustering and image
segmentation. GPUs are an ideal platform for graph algorithms due to their high
peak performance and memory bandwidth. While there exist several GPU
connectivity algorithms in the literature, many design choices have not yet
been explored. In this paper, we explore various design choices in GPU
connectivity algorithms, including sampling, linking, and tree compression, for
both the static as well as the incremental setting. Our various design choices
lead to over 300 new GPU implementations of connectivity, many of which
outperform state-of-the-art. We present an experimental evaluation, and show
that we achieve an average speedup of 2.47x speedup over existing static
algorithms. In the incremental setting, we achieve a throughput of up to 48.23
billion edges per second. Compared to state-of-the-art CPU implementations on a
72-core machine, we achieve a speedup of 8.26--14.51x for static connectivity
and 1.85--13.36x for incremental connectivity using a Tesla V100 GPU
A Randomized Concurrent Algorithm for Disjoint Set Union
The disjoint set union problem is a basic problem in data structures with a wide variety of applications. We extend a known efficient sequential algorithm for this problem to obtain a simple and efficient concurrent wait-free algorithm running on an asynchronous parallel random access machine (APRAM). Crucial to our result is the use of randomization. Under a certain independence assumption, for a problem instance in which there are n elements, m operations, and p processes, our algorithm does Theta(m (alpha(n, m/(np)) + log(np/m + 1))) expected work, where the expectation is over the random choices made by the algorithm and alpha is a functional inverse of Ackermann's function. In addition, each operation takes O(log n) steps with high probability. Our algorithm is significantly simpler and more efficient than previous algorithms proposed by Anderson and Woll. Under our independence assumption, our algorithm achieves almost-linear speed-up for applications in which all or most of the processes can be kept busy