7,743 research outputs found
Lock-free Concurrent Data Structures
Concurrent data structures are the data sharing side of parallel programming.
Data structures give the means to the program to store data, but also provide
operations to the program to access and manipulate these data. These operations
are implemented through algorithms that have to be efficient. In the sequential
setting, data structures are crucially important for the performance of the
respective computation. In the parallel programming setting, their importance
becomes more crucial because of the increased use of data and resource sharing
for utilizing parallelism.
The first and main goal of this chapter is to provide a sufficient background
and intuition to help the interested reader to navigate in the complex research
area of lock-free data structures. The second goal is to offer the programmer
familiarity to the subject that will allow her to use truly concurrent methods.Comment: To appear in "Programming Multi-core and Many-core Computing
Systems", eds. S. Pllana and F. Xhafa, Wiley Series on Parallel and
Distributed Computin
A Scalable, Portable, and Memory-Efficient Lock-Free FIFO Queue
We present a new lock-free multiple-producer and multiple-consumer (MPMC) FIFO queue design which is scalable and, unlike existing high-performant queues, very memory efficient. Moreover, the design is ABA safe and does not require any external memory allocators or safe memory reclamation techniques, typically needed by other scalable designs. In fact, this queue itself can be leveraged for object allocation and reclamation, as in data pools. We use FAA (fetch-and-add), a specialized and more scalable than CAS (compare-and-set) instruction, on the most contended hot spots of the algorithm. However, unlike prior attempts with FAA, our queue is both lock-free and linearizable.
We propose a general approach, SCQ, for bounded queues. This approach can easily be extended to support unbounded FIFO queues which can store an arbitrary number of elements. SCQ is portable across virtually all existing architectures and flexible enough for a wide variety of uses. We measure the performance of our algorithm on the x86-64 and PowerPC architectures. Our evaluation validates that our queue has exceptional memory efficiency compared to other algorithms and its performance is often comparable to, or exceeding that of state-of-the-art scalable algorithms
Boosting Multi-Core Reachability Performance with Shared Hash Tables
This paper focuses on data structures for multi-core reachability, which is a
key component in model checking algorithms and other verification methods. A
cornerstone of an efficient solution is the storage of visited states. In
related work, static partitioning of the state space was combined with
thread-local storage and resulted in reasonable speedups, but left open whether
improvements are possible. In this paper, we present a scaling solution for
shared state storage which is based on a lockless hash table implementation.
The solution is specifically designed for the cache architecture of modern
CPUs. Because model checking algorithms impose loose requirements on the hash
table operations, their design can be streamlined substantially compared to
related work on lockless hash tables. Still, an implementation of the hash
table presented here has dozens of sensitive performance parameters (bucket
size, cache line size, data layout, probing sequence, etc.). We analyzed their
impact and compared the resulting speedups with related tools. Our
implementation outperforms two state-of-the-art multi-core model checkers (SPIN
and DiVinE) by a substantial margin, while placing fewer constraints on the
load balancing and search algorithms.Comment: preliminary repor
The Adaptive Priority Queue with Elimination and Combining
Priority queues are fundamental abstract data structures, often used to
manage limited resources in parallel programming. Several proposed parallel
priority queue implementations are based on skiplists, harnessing the potential
for parallelism of the add() operations. In addition, methods such as Flat
Combining have been proposed to reduce contention by batching together multiple
operations to be executed by a single thread. While this technique can decrease
lock-switching overhead and the number of pointer changes required by the
removeMin() operations in the priority queue, it can also create a sequential
bottleneck and limit parallelism, especially for non-conflicting add()
operations.
In this paper, we describe a novel priority queue design, harnessing the
scalability of parallel insertions in conjunction with the efficiency of
batched removals. Moreover, we present a new elimination algorithm suitable for
a priority queue, which further increases concurrency on balanced workloads
with similar numbers of add() and removeMin() operations. We implement and
evaluate our design using a variety of techniques including locking, atomic
operations, hardware transactional memory, as well as employing adaptive
heuristics given the workload.Comment: Accepted at DISC'14 - this is the full version with appendices,
including more algorithm
Lock-free atom garbage collection for multithreaded Prolog
The runtime system of dynamic languages such as Prolog or Lisp and their
derivatives contain a symbol table, in Prolog often called the atom table. A
simple dynamically resizing hash-table used to be an adequate way to implement
this table. As Prolog becomes fashionable for 24x7 server processes we need to
deal with atom garbage collection and concurrent access to the atom table.
Classical lock-based implementations to ensure consistency of the atom table
scale poorly and a stop-the-world approach to implement atom garbage collection
quickly becomes a bottle-neck, making Prolog unsuitable for soft real-time
applications. In this article we describe a novel implementation for the atom
table using lock-free techniques where the atom-table remains accessible even
during atom garbage collection. Relying only on CAS (Compare And Swap) and not
on external libraries, the implementation is straightforward and portable.
Under consideration for acceptance in TPLP.Comment: Paper presented at the 32nd International Conference on Logic
Programming (ICLP 2016), New York City, USA, 16-21 October 2016, 14 pages,
LaTeX, 4 PDF figure
- …