3 research outputs found
LIPIcs
Union-Find (or Disjoint-Set Union) is one of the fundamental problems in computer science; it has been well-studied from both theoretical and practical perspectives in the sequential case. Recently, there has been mounting interest in analyzing this problem in the concurrent scenario, and several asymptotically-efficient algorithms have been proposed. Yet, to date, there is very little known about the practical performance of concurrent Union-Find. This work addresses this gap. We evaluate and analyze the performance of several concurrent Union-Find algorithms and optimization strategies across a wide range of platforms (Intel, AMD, and ARM) and workloads (social, random, and road networks, as well as integrations into more complex algorithms). We first observe that, due to the limited computational cost, the number of induced cache misses is the critical determining factor for the performance of existing algorithms. We introduce new techniques to reduce this cost by storing node priorities implicitly and by using plain reads and writes in a way that does not affect the correctness of the algorithms. Finally, we show that Union-Find implementations are an interesting application for Transactional Memory (TM): one of the fastest algorithm variants we discovered is a sequential one that uses coarse-grained locking with the lock elision optimization to reduce synchronization cost and increase scalability
Provably-Efficient and Internally-Deterministic Parallel Union-Find
Determining the degree of inherent parallelism in classical sequential
algorithms and leveraging it for fast parallel execution is a key topic in
parallel computing, and detailed analyses are known for a wide range of
classical algorithms. In this paper, we perform the first such analysis for the
fundamental Union-Find problem, in which we are given a graph as a sequence of
edges, and must maintain its connectivity structure under edge additions. We
prove that classic sequential algorithms for this problem are
well-parallelizable under reasonable assumptions, addressing a conjecture by
[Blelloch, 2017]. More precisely, we show via a new potential argument that,
under uniform random edge ordering, parallel union-find operations are unlikely
to interfere: concurrent threads processing the graph in parallel will
encounter memory contention times in
expectation, where and are the number of edges and nodes in the
graph, respectively. We leverage this result to design a new parallel
Union-Find algorithm that is both internally deterministic, i.e., its results
are guaranteed to match those of a sequential execution, but also
work-efficient and scalable, as long as the number of threads is
, for an arbitrarily small constant
, which holds for most large real-world graphs. We present
lower bounds which show that our analysis is close to optimal, and experimental
results suggesting that the performance cost of internal determinism is
limited
ConnectIt: A Framework for Static and Incremental Parallel Graph Connectivity Algorithms
Connected components is a fundamental kernel in graph applications due to its
usefulness in measuring how well-connected a graph is, as well as its use as
subroutines in many other graph algorithms. The fastest existing parallel
multicore algorithms for connectivity are based on some form of edge sampling
and/or linking and compressing trees. However, many combinations of these
design choices have been left unexplored. In this paper, we design the
ConnectIt framework, which provides different sampling strategies as well as
various tree linking and compression schemes. ConnectIt enables us to obtain
several hundred new variants of connectivity algorithms, most of which extend
to computing spanning forest. In addition to static graphs, we also extend
ConnectIt to support mixes of insertions and connectivity queries in the
concurrent setting.
We present an experimental evaluation of ConnectIt on a 72-core machine,
which we believe is the most comprehensive evaluation of parallel connectivity
algorithms to date. Compared to a collection of state-of-the-art static
multicore algorithms, we obtain an average speedup of 37.4x (2.36x average
speedup over the fastest existing implementation for each graph). Using
ConnectIt, we are able to compute connectivity on the largest
publicly-available graph (with over 3.5 billion vertices and 128 billion edges)
in under 10 seconds using a 72-core machine, providing a 3.1x speedup over the
fastest existing connectivity result for this graph, in any computational
setting. For our incremental algorithms, we show that our algorithms can ingest
graph updates at up to several billion edges per second. Finally, to guide the
user in selecting the best variants in ConnectIt for different situations, we
provide a detailed analysis of the different strategies in terms of their work
and locality