8 research outputs found
Brief Announcement: Parallel Dynamic Tree Contraction via Self-Adjusting Computation
International audienceDynamic algorithms are used to compute a property of some data while the data undergoes changes over time. Many dynamic algorithms have been proposed but nearly all are sequential. In this paper, we present our ongoing work on designing a parallel algorithm for the dynamic trees problem, which requires computing a property of a forest as the forest undergoes changes. Our algorithm allows insertion and/or deletion of both vertices and edges anywhere in the input and performs updates in parallel. We obtain our algorithm by applying a dynamization technique called self-adjusting computation to the classic algorithm of Miller and Reif for tree contraction
Relaxed Schedulers Can Efficiently Parallelize Iterative Algorithms
There has been significant progress in understanding the parallelism inherent
to iterative sequential algorithms: for many classic algorithms, the depth of
the dependence structure is now well understood, and scheduling techniques have
been developed to exploit this shallow dependence structure for efficient
parallel implementations. A related, applied research strand has studied
methods by which certain iterative task-based algorithms can be efficiently
parallelized via relaxed concurrent priority schedulers. These allow for high
concurrency when inserting and removing tasks, at the cost of executing
superfluous work due to the relaxed semantics of the scheduler.
In this work, we take a step towards unifying these two research directions,
by showing that there exists a family of relaxed priority schedulers that can
efficiently and deterministically execute classic iterative algorithms such as
greedy maximal independent set (MIS) and matching. Our primary result shows
that, given a randomized scheduler with an expected relaxation factor of in
terms of the maximum allowed priority inversions on a task, and any graph on
vertices, the scheduler is able to execute greedy MIS with only an additive
factor of poly() expected additional iterations compared to an exact (but
not scalable) scheduler. This counter-intuitive result demonstrates that the
overhead of relaxation when computing MIS is not dependent on the input size or
structure of the input graph. Experimental results show that this overhead can
be clearly offset by the gain in performance due to the highly scalable
scheduler. In sum, we present an efficient method to deterministically
parallelize iterative sequential algorithms, with provable runtime guarantees
in terms of the number of executed tasks to completion.Comment: PODC 2018, pages 377-386 in proceeding
Provably-Efficient and Internally-Deterministic Parallel Union-Find
Determining the degree of inherent parallelism in classical sequential
algorithms and leveraging it for fast parallel execution is a key topic in
parallel computing, and detailed analyses are known for a wide range of
classical algorithms. In this paper, we perform the first such analysis for the
fundamental Union-Find problem, in which we are given a graph as a sequence of
edges, and must maintain its connectivity structure under edge additions. We
prove that classic sequential algorithms for this problem are
well-parallelizable under reasonable assumptions, addressing a conjecture by
[Blelloch, 2017]. More precisely, we show via a new potential argument that,
under uniform random edge ordering, parallel union-find operations are unlikely
to interfere: concurrent threads processing the graph in parallel will
encounter memory contention times in
expectation, where and are the number of edges and nodes in the
graph, respectively. We leverage this result to design a new parallel
Union-Find algorithm that is both internally deterministic, i.e., its results
are guaranteed to match those of a sequential execution, but also
work-efficient and scalable, as long as the number of threads is
, for an arbitrarily small constant
, which holds for most large real-world graphs. We present
lower bounds which show that our analysis is close to optimal, and experimental
results suggesting that the performance cost of internal determinism is
limited
Parallel Write-Efficient Algorithms and Data Structures for Computational Geometry
In this paper, we design parallel write-efficient geometric algorithms that
perform asymptotically fewer writes than standard algorithms for the same
problem. This is motivated by emerging non-volatile memory technologies with
read performance being close to that of random access memory but writes being
significantly more expensive in terms of energy and latency. We design
algorithms for planar Delaunay triangulation, -d trees, and static and
dynamic augmented trees. Our algorithms are designed in the recently introduced
Asymmetric Nested-Parallel Model, which captures the parallel setting in which
there is a small symmetric memory where reads and writes are unit cost as well
as a large asymmetric memory where writes are times more expensive
than reads. In designing these algorithms, we introduce several techniques for
obtaining write-efficiency, including DAG tracing, prefix doubling,
reconstruction-based rebalancing and -labeling, which we believe will
be useful for designing other parallel write-efficient algorithms
Optimal (Randomized) Parallel Algorithms in the Binary-Forking Model
In this paper we develop optimal algorithms in the binary-forking model for a
variety of fundamental problems, including sorting, semisorting, list ranking,
tree contraction, range minima, and ordered set union, intersection and
difference. In the binary-forking model, tasks can only fork into two child
tasks, but can do so recursively and asynchronously. The tasks share memory,
supporting reads, writes and test-and-sets. Costs are measured in terms of work
(total number of instructions), and span (longest dependence chain).
The binary-forking model is meant to capture both algorithm performance and
algorithm-design considerations on many existing multithreaded languages, which
are also asynchronous and rely on binary forks either explicitly or under the
covers. In contrast to the widely studied PRAM model, it does not assume
arbitrary-way forks nor synchronous operations, both of which are hard to
implement in modern hardware. While optimal PRAM algorithms are known for the
problems studied herein, it turns out that arbitrary-way forking and strict
synchronization are powerful, if unrealistic, capabilities. Natural simulations
of these PRAM algorithms in the binary-forking model (i.e., implementations in
existing parallel languages) incur an overhead in span. This
paper explores techniques for designing optimal algorithms when limited to
binary forking and assuming asynchrony. All algorithms described in this paper
are the first algorithms with optimal work and span in the binary-forking
model. Most of the algorithms are simple. Many are randomized