Search CORE

14,749 research outputs found

Towards Work-Efficient Parallel Parameterized Algorithms

Author: J Cheetham
J Chen
L Cai
M Cesati
M Elberfeld
R Downey
R Niedermeier
RM Karp
Y Han
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 20/02/2019
Field of study

Parallel parameterized complexity theory studies how fixed-parameter tractable (fpt) problems can be solved in parallel. Previous theoretical work focused on parallel algorithms that are very fast in principle, but did not take into account that when we only have a small number of processors (between 2 and, say, 1024), it is more important that the parallel algorithms are work-efficient. In the present paper we investigate how work-efficient fpt algorithms can be designed. We review standard methods from fpt theory, like kernelization, search trees, and interleaving, and prove trade-offs for them between work efficiency and runtime improvements. This results in a toolbox for developing work-efficient parallel fpt algorithms.Comment: Prior full version of the paper that will appear in Proceedings of the 13th International Conference and Workshops on Algorithms and Computation (WALCOM 2019), February 27 - March 02, 2019, Guwahati, India. The final authenticated version is available online at https://doi.org/10.1007/978-3-030-10564-8_2

arXiv.org e-Print Archive

Crossref

Relaxed Schedulers Can Efficiently Parallelize Iterative Algorithms

Author: Alistarh Dan
Brown Trevor
Kopinsky Justin
Nadiradze Giorgi
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2018
Field of study

There has been significant progress in understanding the parallelism inherent to iterative sequential algorithms: for many classic algorithms, the depth of the dependence structure is now well understood, and scheduling techniques have been developed to exploit this shallow dependence structure for efficient parallel implementations. A related, applied research strand has studied methods by which certain iterative task-based algorithms can be efficiently parallelized via relaxed concurrent priority schedulers. These allow for high concurrency when inserting and removing tasks, at the cost of executing superfluous work due to the relaxed semantics of the scheduler. In this work, we take a step towards unifying these two research directions, by showing that there exists a family of relaxed priority schedulers that can efficiently and deterministically execute classic iterative algorithms such as greedy maximal independent set (MIS) and matching. Our primary result shows that, given a randomized scheduler with an expected relaxation factor of

k

in terms of the maximum allowed priority inversions on a task, and any graph on

n

vertices, the scheduler is able to execute greedy MIS with only an additive factor of poly(

k

) expected additional iterations compared to an exact (but not scalable) scheduler. This counter-intuitive result demonstrates that the overhead of relaxation when computing MIS is not dependent on the input size or structure of the input graph. Experimental results show that this overhead can be clearly offset by the gain in performance due to the highly scalable scheduler. In sum, we present an efficient method to deterministically parallelize iterative sequential algorithms, with provable runtime guarantees in terms of the number of executed tasks to completion.Comment: PODC 2018, pages 377-386 in proceeding

arXiv.org e-Print Archive

Crossref

IST Austria: PubRep (Institute of Science and Technology)

Parallel Algorithm and Dynamic Exponent for Diffusion-limited Aggregation

Author: A. Gibbons
C. Amitrano
C. Amitrano
C. H. Bennett
C. H. Bennett
C. H. Papadimitriou
H. Kaufman
J. Machta
J. Machta
J. Machta
P. Ossadnik
P. Ossadnik
R. C. Ball
R. F. Voss
R. F. Voss
R. Greenlaw
R. J. Anderson
S. Tolman
T. A. Witten
T. C. Halsey
T. Vicsek
Publication venue: 'American Physical Society (APS)'
Publication date: 17/12/1996
Field of study

A parallel algorithm for ``diffusion-limited aggregation'' (DLA) is described and analyzed from the perspective of computational complexity. The dynamic exponent z of the algorithm is defined with respect to the probabilistic parallel random-access machine (PRAM) model of parallel computation according to

T \sim L^{z}

, where L is the cluster size, T is the running time, and the algorithm uses a number of processors polynomial in L\@. It is argued that z=D-D_2/2, where D is the fractal dimension and D_2 is the second generalized dimension. Simulations of DLA are carried out to measure D_2 and to test scaling assumptions employed in the complexity analysis of the parallel algorithm. It is plausible that the parallel algorithm attains the minimum possible value of the dynamic exponent in which case z characterizes the intrinsic history dependence of DLA.Comment: 24 pages Revtex and 2 figures. A major improvement to the algorithm and smaller dynamic exponent in this versio

arXiv.org e-Print Archive

Crossref

Computer architecture for efficient algorithmic executions in real-time systems: New technology for avionics systems and advanced space vehicles

Author: Carroll Chester C.
Saha Aindam
Youngblood John N.
Publication venue
Publication date
Field of study

Improvements and advances in the development of computer architecture now provide innovative technology for the recasting of traditional sequential solutions into high-performance, low-cost, parallel system to increase system performance. Research conducted in development of specialized computer architecture for the algorithmic execution of an avionics system, guidance and control problem in real time is described. A comprehensive treatment of both the hardware and software structures of a customized computer which performs real-time computation of guidance commands with updated estimates of target motion and time-to-go is presented. An optimal, real-time allocation algorithm was developed which maps the algorithmic tasks onto the processing elements. This allocation is based on the critical path analysis. The final stage is the design and development of the hardware structures suitable for the efficient execution of the allocated task graph. The processing element is designed for rapid execution of the allocated tasks. Fault tolerance is a key feature of the overall architecture. Parallel numerical integration techniques, tasks definitions, and allocation algorithms are discussed. The parallel implementation is analytically verified and the experimental results are presented. The design of the data-driven computer architecture, customized for the execution of the particular algorithm, is discussed

NASA Technical Reports Server

Taming Numbers and Durations in the Model Checking Integrated Planning System

Author: Edelkamp S.
Publication venue: 'AI Access Foundation'
Publication date: 30/06/2011
Field of study

The Model Checking Integrated Planning System (MIPS) is a temporal least commitment heuristic search planner based on a flexible object-oriented workbench architecture. Its design clearly separates explicit and symbolic directed exploration algorithms from the set of on-line and off-line computed estimates and associated data structures. MIPS has shown distinguished performance in the last two international planning competitions. In the last event the description language was extended from pure propositional planning to include numerical state variables, action durations, and plan quality objective functions. Plans were no longer sequences of actions but time-stamped schedules. As a participant of the fully automated track of the competition, MIPS has proven to be a general system; in each track and every benchmark domain it efficiently computed plans of remarkable quality. This article introduces and analyzes the most important algorithmic novelties that were necessary to tackle the new layers of expressiveness in the benchmark problems and to achieve a high level of performance. The extensions include critical path analysis of sequentially generated plans to generate corresponding optimal parallel plans. The linear time algorithm to compute the parallel plan bypasses known NP hardness results for partial ordering by scheduling plans with respect to the set of actions and the imposed precedence relations. The efficiency of this algorithm also allows us to improve the exploration guidance: for each encountered planning state the corresponding approximate sequential plan is scheduled. One major strength of MIPS is its static analysis phase that grounds and simplifies parameterized predicates, functions and operators, that infers knowledge to minimize the state description length, and that detects domain object symmetries. The latter aspect is analyzed in detail. MIPS has been developed to serve as a complete and optimal state space planner, with admissible estimates, exploration engines and branching cuts. In the competition version, however, certain performance compromises had to be made, including floating point arithmetic, weighted heuristic search exploration according to an inadmissible estimate and parameterized optimization

arXiv.org e-Print Archive

Crossref

The Lock-free $k$ -LSM Relaxed Priority Queue

Author: Gruber Jakob
Träff Jesper Larsson
Tsigas Philippas
Wimmer Martin
Publication venue
Publication date: 01/01/2015
Field of study

Priority queues are data structures which store keys in an ordered fashion to allow efficient access to the minimal (maximal) key. Priority queues are essential for many applications, e.g., Dijkstra's single-source shortest path algorithm, branch-and-bound algorithms, and prioritized schedulers. Efficient multiprocessor computing requires implementations of basic data structures that can be used concurrently and scale to large numbers of threads and cores. Lock-free data structures promise superior scalability by avoiding blocking synchronization primitives, but the \emph{delete-min} operation is an inherent scalability bottleneck in concurrent priority queues. Recent work has focused on alleviating this obstacle either by batching operations, or by relaxing the requirements to the \emph{delete-min} operation. We present a new, lock-free priority queue that relaxes the \emph{delete-min} operation so that it is allowed to delete \emph{any} of the

\rho+1

smallest keys, where

\rho

is a runtime configurable parameter. Additionally, the behavior is identical to a non-relaxed priority queue for items added and removed by the same thread. The priority queue is built from a logarithmic number of sorted arrays in a way similar to log-structured merge-trees. We experimentally compare our priority queue to recent state-of-the-art lock-free priority queues, both with relaxed and non-relaxed semantics, showing high performance and good scalability of our approach.Comment: Short version as ACM PPoPP'15 poste

arXiv.org e-Print Archive

Crossref

Chalmers Research