5,197 research outputs found
Are Lock-Free Concurrent Algorithms Practically Wait-Free?
Lock-free concurrent algorithms guarantee that some concurrent operation will
always make progress in a finite number of steps. Yet programmers prefer to
treat concurrent code as if it were wait-free, guaranteeing that all operations
always make progress. Unfortunately, designing wait-free algorithms is
generally a very complex task, and the resulting algorithms are not always
efficient. While obtaining efficient wait-free algorithms has been a long-time
goal for the theory community, most non-blocking commercial code is only
lock-free.
This paper suggests a simple solution to this problem. We show that, for a
large class of lock- free algorithms, under scheduling conditions which
approximate those found in commercial hardware architectures, lock-free
algorithms behave as if they are wait-free. In other words, programmers can
keep on designing simple lock-free algorithms instead of complex wait-free
ones, and in practice, they will get wait-free progress.
Our main contribution is a new way of analyzing a general class of lock-free
algorithms under a stochastic scheduler. Our analysis relates the individual
performance of processes with the global performance of the system using Markov
chain lifting between a complex per-process chain and a simpler system progress
chain. We show that lock-free algorithms are not only wait-free with
probability 1, but that in fact a general subset of lock-free algorithms can be
closely bounded in terms of the average number of steps required until an
operation completes.
To the best of our knowledge, this is the first attempt to analyze progress
conditions, typically stated in relation to a worst case adversary, in a
stochastic model capturing their expected asymptotic behavior.Comment: 25 page
Analyzing the Performance of Lock-Free Data Structures: A Conflict-based Model
This paper considers the modeling and the analysis of the performance of
lock-free concurrent data structures. Lock-free designs employ an optimistic
conflict control mechanism, allowing several processes to access the shared
data object at the same time. They guarantee that at least one concurrent
operation finishes in a finite number of its own steps regardless of the state
of the operations. Our analysis considers such lock-free data structures that
can be represented as linear combinations of fixed size retry loops. Our main
contribution is a new way of modeling and analyzing a general class of
lock-free algorithms, achieving predictions of throughput that are close to
what we observe in practice. We emphasize two kinds of conflicts that shape the
performance: (i) hardware conflicts, due to concurrent calls to atomic
primitives; (ii) logical conflicts, caused by simultaneous operations on the
shared data structure. We show how to deal with these hardware and logical
conflicts separately, and how to combine them, so as to calculate the
throughput of lock-free algorithms. We propose also a common framework that
enables a fair comparison between lock-free implementations by covering the
whole contention domain, together with a better understanding of the
performance impacting factors. This part of our analysis comes with a method
for calculating a good back-off strategy to finely tune the performance of a
lock-free algorithm. Our experimental results, based on a set of widely used
concurrent data structures and on abstract lock-free designs, show that our
analysis follows closely the actual code behavior.Comment: Short version to appear in DISC'1
Lock-free Concurrent Data Structures
Concurrent data structures are the data sharing side of parallel programming.
Data structures give the means to the program to store data, but also provide
operations to the program to access and manipulate these data. These operations
are implemented through algorithms that have to be efficient. In the sequential
setting, data structures are crucially important for the performance of the
respective computation. In the parallel programming setting, their importance
becomes more crucial because of the increased use of data and resource sharing
for utilizing parallelism.
The first and main goal of this chapter is to provide a sufficient background
and intuition to help the interested reader to navigate in the complex research
area of lock-free data structures. The second goal is to offer the programmer
familiarity to the subject that will allow her to use truly concurrent methods.Comment: To appear in "Programming Multi-core and Many-core Computing
Systems", eds. S. Pllana and F. Xhafa, Wiley Series on Parallel and
Distributed Computin
Efficient Lock-free Binary Search Trees
In this paper we present a novel algorithm for concurrent lock-free internal
binary search trees (BST) and implement a Set abstract data type (ADT) based on
that. We show that in the presented lock-free BST algorithm the amortized step
complexity of each set operation - {\sc Add}, {\sc Remove} and {\sc Contains} -
is , where, is the height of BST with number of nodes
and is the contention during the execution. Our algorithm adapts to
contention measures according to read-write load. If the situation is
read-heavy, the operations avoid helping pending concurrent {\sc Remove}
operations during traversal, and, adapt to interval contention. However, for
write-heavy situations we let an operation help pending {\sc Remove}, even
though it is not obstructed, and so adapt to tighter point contention. It uses
single-word compare-and-swap (\texttt{CAS}) operations. We show that our
algorithm has improved disjoint-access-parallelism compared to similar existing
algorithms. We prove that the presented algorithm is linearizable. To the best
of our knowledge this is the first algorithm for any concurrent tree data
structure in which the modify operations are performed with an additive term of
contention measure.Comment: 15 pages, 3 figures, submitted to POD
Lock-Free and Practical Deques using Single-Word Compare-And-Swap
We present an efficient and practical lock-free implementation of a
concurrent deque that is disjoint-parallel accessible and uses atomic
primitives which are available in modern computer systems. Previously known
lock-free algorithms of deques are either based on non-available atomic
synchronization primitives, only implement a subset of the functionality, or
are not designed for disjoint accesses. Our algorithm is based on a doubly
linked list, and only requires single-word compare-and-swap atomic primitives,
even for dynamic memory sizes. We have performed an empirical study using full
implementations of the most efficient algorithms of lock-free deques known. For
systems with low concurrency, the algorithm by Michael shows the best
performance. However, as our algorithm is designed for disjoint accesses, it
performs significantly better on systems with high concurrency and non-uniform
memory architecture
Optimising Simulation Data Structures for the Xeon Phi
In this paper, we propose a lock-free architecture
to accelerate logic gate circuit simulation using SIMD multi-core
machines. We evaluate its performance on different test circuits
simulated on the Intel Xeon Phi and 2 other machines. Comparisons
are presented of this software/hardware combination with
reported performances of GPU and other multi-core simulation
platforms. Comparisons are also given between the lock free
architecture and a leading commercial simulator running on the
same Intel hardware
The Lock-free -LSM Relaxed Priority Queue
Priority queues are data structures which store keys in an ordered fashion to
allow efficient access to the minimal (maximal) key. Priority queues are
essential for many applications, e.g., Dijkstra's single-source shortest path
algorithm, branch-and-bound algorithms, and prioritized schedulers.
Efficient multiprocessor computing requires implementations of basic data
structures that can be used concurrently and scale to large numbers of threads
and cores. Lock-free data structures promise superior scalability by avoiding
blocking synchronization primitives, but the \emph{delete-min} operation is an
inherent scalability bottleneck in concurrent priority queues. Recent work has
focused on alleviating this obstacle either by batching operations, or by
relaxing the requirements to the \emph{delete-min} operation.
We present a new, lock-free priority queue that relaxes the \emph{delete-min}
operation so that it is allowed to delete \emph{any} of the smallest
keys, where is a runtime configurable parameter. Additionally, the
behavior is identical to a non-relaxed priority queue for items added and
removed by the same thread. The priority queue is built from a logarithmic
number of sorted arrays in a way similar to log-structured merge-trees. We
experimentally compare our priority queue to recent state-of-the-art lock-free
priority queues, both with relaxed and non-relaxed semantics, showing high
performance and good scalability of our approach.Comment: Short version as ACM PPoPP'15 poste
- …