55,551 research outputs found
Towards a High-Level Implementation of Execution Primitives for Unrestricted, Independent And-Parallelism
Most efficient implementations of parallel logic programming rely on complex low-level machinery which is arguably difficult to implement and modify. We explore an alternative approach aimed at taming that complexity by raising core parts of the implementation to the source language level for the particular case of and-parallellism. We handle a significant portion of the parallel implementation at the Prolog level with the help of a comparatively small number of concurrency.related primitives which take case of lower-level tasks such as locking, thread management, stack set management, etc. The approach does not eliminate altogether modifications to the abstract machine, but it does greatly simplify them and it also facilitates experimenting with different alternatives. We show how this approach allows implementing both restricted and unrestricted (i.e., non fork-join) parallelism. Preliminary esperiments show thay the performance safcrifieced is reasonable, although granularity of unrestricted parallelism contributes to better observed speedups
Boosting Multi-Core Reachability Performance with Shared Hash Tables
This paper focuses on data structures for multi-core reachability, which is a
key component in model checking algorithms and other verification methods. A
cornerstone of an efficient solution is the storage of visited states. In
related work, static partitioning of the state space was combined with
thread-local storage and resulted in reasonable speedups, but left open whether
improvements are possible. In this paper, we present a scaling solution for
shared state storage which is based on a lockless hash table implementation.
The solution is specifically designed for the cache architecture of modern
CPUs. Because model checking algorithms impose loose requirements on the hash
table operations, their design can be streamlined substantially compared to
related work on lockless hash tables. Still, an implementation of the hash
table presented here has dozens of sensitive performance parameters (bucket
size, cache line size, data layout, probing sequence, etc.). We analyzed their
impact and compared the resulting speedups with related tools. Our
implementation outperforms two state-of-the-art multi-core model checkers (SPIN
and DiVinE) by a substantial margin, while placing fewer constraints on the
load balancing and search algorithms.Comment: preliminary repor
An Almost Tight RMR Lower Bound for Abortable Test-And-Set
We prove a lower bound of Omega(log n/log log n) for the remote memory reference (RMR) complexity of abortable test-and-set (leader election) in the cache-coherent (CC) and the distributed shared memory (DSM) model. This separates the complexities of abortable and non-abortable test-and-set, as the latter has constant RMR complexity [Wojciech Golab et al., 2010].
Golab, Hendler, Hadzilacos and Woelfel [Wojciech M. Golab et al., 2012] showed that compare-and-swap can be implemented from registers and test-and-set objects with constant RMR complexity. We observe that a small modification to that implementation is abortable, provided that the used test-and-set objects are atomic (or abortable). As a consequence, using existing efficient randomized wait-free implementations of test-and-set [George Giakkoupis and Philipp Woelfel, 2012], we obtain randomized abortable compare-and-swap objects with almost constant (O(log^* n)) RMR complexity
Fast Deterministic Consensus in a Noisy Environment
It is well known that the consensus problem cannot be solved
deterministically in an asynchronous environment, but that randomized solutions
are possible. We propose a new model, called noisy scheduling, in which an
adversarial schedule is perturbed randomly, and show that in this model
randomness in the environment can substitute for randomness in the algorithm.
In particular, we show that a simplified, deterministic version of Chandra's
wait-free shared-memory consensus algorithm (PODC, 1996, pp. 166-175) solves
consensus in time at most logarithmic in the number of active processes. The
proof of termination is based on showing that a race between independent
delayed renewal processes produces a winner quickly. In addition, we show that
the protocol finishes in constant time using quantum and priority-based
scheduling on a uniprocessor, suggesting that it is robust against the choice
of model over a wide range.Comment: Typographical errors fixe
Lock-free Concurrent Data Structures
Concurrent data structures are the data sharing side of parallel programming.
Data structures give the means to the program to store data, but also provide
operations to the program to access and manipulate these data. These operations
are implemented through algorithms that have to be efficient. In the sequential
setting, data structures are crucially important for the performance of the
respective computation. In the parallel programming setting, their importance
becomes more crucial because of the increased use of data and resource sharing
for utilizing parallelism.
The first and main goal of this chapter is to provide a sufficient background
and intuition to help the interested reader to navigate in the complex research
area of lock-free data structures. The second goal is to offer the programmer
familiarity to the subject that will allow her to use truly concurrent methods.Comment: To appear in "Programming Multi-core and Many-core Computing
Systems", eds. S. Pllana and F. Xhafa, Wiley Series on Parallel and
Distributed Computin
Fast and Compact Distributed Verification and Self-Stabilization of a DFS Tree
We present algorithms for distributed verification and silent-stabilization
of a DFS(Depth First Search) spanning tree of a connected network. Computing
and maintaining such a DFS tree is an important task, e.g., for constructing
efficient routing schemes. Our algorithm improves upon previous work in various
ways. Comparable previous work has space and time complexities of bits per node and respectively, where is the highest
degree of a node, is the number of nodes and is the diameter of the
network. In contrast, our algorithm has a space complexity of bits
per node, which is optimal for silent-stabilizing spanning trees and runs in
time. In addition, our solution is modular since it utilizes the
distributed verification algorithm as an independent subtask of the overall
solution. It is possible to use the verification algorithm as a stand alone
task or as a subtask in another algorithm. To demonstrate the simplicity of
constructing efficient DFS algorithms using the modular approach, We also
present a (non-sielnt) self-stabilizing DFS token circulation algorithm for
general networks based on our silent-stabilizing DFS tree. The complexities of
this token circulation algorithm are comparable to the known ones
- …