303,424 research outputs found
Work‐Efficient Parallel Union‐Find
The incremental graph connectivity (IGC) problem is to maintain a data structure that can quickly answer whether two given vertices in a graph are connected, while allowing more edges to be added to the graph. IGC is a fundamental problem and can be solved efficiently in the sequential setting using a solution to the classical union‐find problem. However, sequential solutions are not sufficient to handle modern‐day large, rapidly‐changing graphs where edge updates arrive at a very high rate. We present the first shared‐memory parallel data structure for union‐find (equivalently, IGC) that is both provably work‐efficient (ie, performs no more work than the best sequential counterpart) and has polylogarithmic parallel depth. We also present a simpler algorithm with slightly worse theoretical properties, but which is easier to implement and has good practical performance. Our experiments on large graph streams with various degree distributions show that it has good practical performance, capable of processing hundreds of millions of edges per second using a 20‐core machine
Provably-Efficient and Internally-Deterministic Parallel Union-Find
Determining the degree of inherent parallelism in classical sequential
algorithms and leveraging it for fast parallel execution is a key topic in
parallel computing, and detailed analyses are known for a wide range of
classical algorithms. In this paper, we perform the first such analysis for the
fundamental Union-Find problem, in which we are given a graph as a sequence of
edges, and must maintain its connectivity structure under edge additions. We
prove that classic sequential algorithms for this problem are
well-parallelizable under reasonable assumptions, addressing a conjecture by
[Blelloch, 2017]. More precisely, we show via a new potential argument that,
under uniform random edge ordering, parallel union-find operations are unlikely
to interfere: concurrent threads processing the graph in parallel will
encounter memory contention times in
expectation, where and are the number of edges and nodes in the
graph, respectively. We leverage this result to design a new parallel
Union-Find algorithm that is both internally deterministic, i.e., its results
are guaranteed to match those of a sequential execution, but also
work-efficient and scalable, as long as the number of threads is
, for an arbitrarily small constant
, which holds for most large real-world graphs. We present
lower bounds which show that our analysis is close to optimal, and experimental
results suggesting that the performance cost of internal determinism is
limited
Task-based Augmented Contour Trees with Fibonacci Heaps
This paper presents a new algorithm for the fast, shared memory, multi-core
computation of augmented contour trees on triangulations. In contrast to most
existing parallel algorithms our technique computes augmented trees, enabling
the full extent of contour tree based applications including data segmentation.
Our approach completely revisits the traditional, sequential contour tree
algorithm to re-formulate all the steps of the computation as a set of
independent local tasks. This includes a new computation procedure based on
Fibonacci heaps for the join and split trees, two intermediate data structures
used to compute the contour tree, whose constructions are efficiently carried
out concurrently thanks to the dynamic scheduling of task parallelism. We also
introduce a new parallel algorithm for the combination of these two trees into
the output global contour tree. Overall, this results in superior time
performance in practice, both in sequential and in parallel thanks to the
OpenMP task runtime. We report performance numbers that compare our approach to
reference sequential and multi-threaded implementations for the computation of
augmented merge and contour trees. These experiments demonstrate the run-time
efficiency of our approach and its scalability on common workstations. We
demonstrate the utility of our approach in data segmentation applications
A parallel edge orientation algorithm for quadrilateral meshes
One approach to achieving correct finite element assembly is to ensure that
the local orientation of facets relative to each cell in the mesh is consistent
with the global orientation of that facet. Rognes et al. have shown how to
achieve this for any mesh composed of simplex elements, and deal.II contains a
serial algorithm to construct a consistent orientation of any quadrilateral
mesh of an orientable manifold.
The core contribution of this paper is the extension of this algorithm for
distributed memory parallel computers, which facilitates its seamless
application as part of a parallel simulation system.
Furthermore, our analysis establishes a link between the well-known
Union-Find algorithm and the construction of a consistent orientation of a
quadrilateral mesh. As a result, existing work on the parallelisation of the
Union-Find algorithm can be easily adapted to construct further parallel
algorithms for mesh orientations.Comment: Second revision: minor change
Connected component identification and cluster update on GPU
Cluster identification tasks occur in a multitude of contexts in physics and
engineering such as, for instance, cluster algorithms for simulating spin
models, percolation simulations, segmentation problems in image processing, or
network analysis. While it has been shown that graphics processing units (GPUs)
can result in speedups of two to three orders of magnitude as compared to
serial codes on CPUs for the case of local and thus naturally parallelized
problems such as single-spin flip update simulations of spin models, the
situation is considerably more complicated for the non-local problem of cluster
or connected component identification. I discuss the suitability of different
approaches of parallelization of cluster labeling and cluster update algorithms
for calculations on GPU and compare to the performance of serial
implementations.Comment: 15 pages, 14 figures, one table, submitted to PR
- …