14,749 research outputs found
Towards Work-Efficient Parallel Parameterized Algorithms
Parallel parameterized complexity theory studies how fixed-parameter
tractable (fpt) problems can be solved in parallel. Previous theoretical work
focused on parallel algorithms that are very fast in principle, but did not
take into account that when we only have a small number of processors (between
2 and, say, 1024), it is more important that the parallel algorithms are
work-efficient. In the present paper we investigate how work-efficient fpt
algorithms can be designed. We review standard methods from fpt theory, like
kernelization, search trees, and interleaving, and prove trade-offs for them
between work efficiency and runtime improvements. This results in a toolbox for
developing work-efficient parallel fpt algorithms.Comment: Prior full version of the paper that will appear in Proceedings of
the 13th International Conference and Workshops on Algorithms and Computation
(WALCOM 2019), February 27 - March 02, 2019, Guwahati, India. The final
authenticated version is available online at
https://doi.org/10.1007/978-3-030-10564-8_2
Relaxed Schedulers Can Efficiently Parallelize Iterative Algorithms
There has been significant progress in understanding the parallelism inherent
to iterative sequential algorithms: for many classic algorithms, the depth of
the dependence structure is now well understood, and scheduling techniques have
been developed to exploit this shallow dependence structure for efficient
parallel implementations. A related, applied research strand has studied
methods by which certain iterative task-based algorithms can be efficiently
parallelized via relaxed concurrent priority schedulers. These allow for high
concurrency when inserting and removing tasks, at the cost of executing
superfluous work due to the relaxed semantics of the scheduler.
In this work, we take a step towards unifying these two research directions,
by showing that there exists a family of relaxed priority schedulers that can
efficiently and deterministically execute classic iterative algorithms such as
greedy maximal independent set (MIS) and matching. Our primary result shows
that, given a randomized scheduler with an expected relaxation factor of in
terms of the maximum allowed priority inversions on a task, and any graph on
vertices, the scheduler is able to execute greedy MIS with only an additive
factor of poly() expected additional iterations compared to an exact (but
not scalable) scheduler. This counter-intuitive result demonstrates that the
overhead of relaxation when computing MIS is not dependent on the input size or
structure of the input graph. Experimental results show that this overhead can
be clearly offset by the gain in performance due to the highly scalable
scheduler. In sum, we present an efficient method to deterministically
parallelize iterative sequential algorithms, with provable runtime guarantees
in terms of the number of executed tasks to completion.Comment: PODC 2018, pages 377-386 in proceeding
Parallel Algorithm and Dynamic Exponent for Diffusion-limited Aggregation
A parallel algorithm for ``diffusion-limited aggregation'' (DLA) is described
and analyzed from the perspective of computational complexity. The dynamic
exponent z of the algorithm is defined with respect to the probabilistic
parallel random-access machine (PRAM) model of parallel computation according
to , where L is the cluster size, T is the running time, and the
algorithm uses a number of processors polynomial in L\@. It is argued that
z=D-D_2/2, where D is the fractal dimension and D_2 is the second generalized
dimension. Simulations of DLA are carried out to measure D_2 and to test
scaling assumptions employed in the complexity analysis of the parallel
algorithm. It is plausible that the parallel algorithm attains the minimum
possible value of the dynamic exponent in which case z characterizes the
intrinsic history dependence of DLA.Comment: 24 pages Revtex and 2 figures. A major improvement to the algorithm
and smaller dynamic exponent in this versio
Computer architecture for efficient algorithmic executions in real-time systems: New technology for avionics systems and advanced space vehicles
Improvements and advances in the development of computer architecture now provide innovative technology for the recasting of traditional sequential solutions into high-performance, low-cost, parallel system to increase system performance. Research conducted in development of specialized computer architecture for the algorithmic execution of an avionics system, guidance and control problem in real time is described. A comprehensive treatment of both the hardware and software structures of a customized computer which performs real-time computation of guidance commands with updated estimates of target motion and time-to-go is presented. An optimal, real-time allocation algorithm was developed which maps the algorithmic tasks onto the processing elements. This allocation is based on the critical path analysis. The final stage is the design and development of the hardware structures suitable for the efficient execution of the allocated task graph. The processing element is designed for rapid execution of the allocated tasks. Fault tolerance is a key feature of the overall architecture. Parallel numerical integration techniques, tasks definitions, and allocation algorithms are discussed. The parallel implementation is analytically verified and the experimental results are presented. The design of the data-driven computer architecture, customized for the execution of the particular algorithm, is discussed
Taming Numbers and Durations in the Model Checking Integrated Planning System
The Model Checking Integrated Planning System (MIPS) is a temporal least
commitment heuristic search planner based on a flexible object-oriented
workbench architecture. Its design clearly separates explicit and symbolic
directed exploration algorithms from the set of on-line and off-line computed
estimates and associated data structures. MIPS has shown distinguished
performance in the last two international planning competitions. In the last
event the description language was extended from pure propositional planning to
include numerical state variables, action durations, and plan quality objective
functions. Plans were no longer sequences of actions but time-stamped
schedules. As a participant of the fully automated track of the competition,
MIPS has proven to be a general system; in each track and every benchmark
domain it efficiently computed plans of remarkable quality. This article
introduces and analyzes the most important algorithmic novelties that were
necessary to tackle the new layers of expressiveness in the benchmark problems
and to achieve a high level of performance. The extensions include critical
path analysis of sequentially generated plans to generate corresponding optimal
parallel plans. The linear time algorithm to compute the parallel plan bypasses
known NP hardness results for partial ordering by scheduling plans with respect
to the set of actions and the imposed precedence relations. The efficiency of
this algorithm also allows us to improve the exploration guidance: for each
encountered planning state the corresponding approximate sequential plan is
scheduled. One major strength of MIPS is its static analysis phase that grounds
and simplifies parameterized predicates, functions and operators, that infers
knowledge to minimize the state description length, and that detects domain
object symmetries. The latter aspect is analyzed in detail. MIPS has been
developed to serve as a complete and optimal state space planner, with
admissible estimates, exploration engines and branching cuts. In the
competition version, however, certain performance compromises had to be made,
including floating point arithmetic, weighted heuristic search exploration
according to an inadmissible estimate and parameterized optimization
The Lock-free -LSM Relaxed Priority Queue
Priority queues are data structures which store keys in an ordered fashion to
allow efficient access to the minimal (maximal) key. Priority queues are
essential for many applications, e.g., Dijkstra's single-source shortest path
algorithm, branch-and-bound algorithms, and prioritized schedulers.
Efficient multiprocessor computing requires implementations of basic data
structures that can be used concurrently and scale to large numbers of threads
and cores. Lock-free data structures promise superior scalability by avoiding
blocking synchronization primitives, but the \emph{delete-min} operation is an
inherent scalability bottleneck in concurrent priority queues. Recent work has
focused on alleviating this obstacle either by batching operations, or by
relaxing the requirements to the \emph{delete-min} operation.
We present a new, lock-free priority queue that relaxes the \emph{delete-min}
operation so that it is allowed to delete \emph{any} of the smallest
keys, where is a runtime configurable parameter. Additionally, the
behavior is identical to a non-relaxed priority queue for items added and
removed by the same thread. The priority queue is built from a logarithmic
number of sorted arrays in a way similar to log-structured merge-trees. We
experimentally compare our priority queue to recent state-of-the-art lock-free
priority queues, both with relaxed and non-relaxed semantics, showing high
performance and good scalability of our approach.Comment: Short version as ACM PPoPP'15 poste
- …