57,153 research outputs found
Lock-free Concurrent Data Structures
Concurrent data structures are the data sharing side of parallel programming.
Data structures give the means to the program to store data, but also provide
operations to the program to access and manipulate these data. These operations
are implemented through algorithms that have to be efficient. In the sequential
setting, data structures are crucially important for the performance of the
respective computation. In the parallel programming setting, their importance
becomes more crucial because of the increased use of data and resource sharing
for utilizing parallelism.
The first and main goal of this chapter is to provide a sufficient background
and intuition to help the interested reader to navigate in the complex research
area of lock-free data structures. The second goal is to offer the programmer
familiarity to the subject that will allow her to use truly concurrent methods.Comment: To appear in "Programming Multi-core and Many-core Computing
Systems", eds. S. Pllana and F. Xhafa, Wiley Series on Parallel and
Distributed Computin
A Wait-free Multi-word Atomic (1,N) Register for Large-scale Data Sharing on Multi-core Machines
We present a multi-word atomic (1,N) register for multi-core machines
exploiting Read-Modify-Write (RMW) instructions to coordinate the writer and
the readers in a wait-free manner. Our proposal, called Anonymous Readers
Counting (ARC), enables large-scale data sharing by admitting up to
concurrent readers on off-the-shelf 64-bits machines, as opposed to the most
advanced RMW-based approach which is limited to 58 readers. Further, ARC avoids
multiple copies of the register content when accessing it---this affects
classical register's algorithms based on atomic read/write operations on single
words. Thus it allows for higher scalability with respect to the register size.
Moreover, ARC explicitly reduces improves performance via a proper limitation
of RMW instructions in case of read operations, and by supporting constant time
for read operations and amortized constant time for write operations. A proof
of correctness of our register algorithm is also provided, together with
experimental data for a comparison with literature proposals. Beyond assessing
ARC on physical platforms, we carry out as well an experimentation on
virtualized infrastructures, which shows the resilience of wait-free
synchronization as provided by ARC with respect to CPU-steal times, proper of
more modern paradigms such as cloud computing.Comment: non
Lock-Free and Practical Deques using Single-Word Compare-And-Swap
We present an efficient and practical lock-free implementation of a
concurrent deque that is disjoint-parallel accessible and uses atomic
primitives which are available in modern computer systems. Previously known
lock-free algorithms of deques are either based on non-available atomic
synchronization primitives, only implement a subset of the functionality, or
are not designed for disjoint accesses. Our algorithm is based on a doubly
linked list, and only requires single-word compare-and-swap atomic primitives,
even for dynamic memory sizes. We have performed an empirical study using full
implementations of the most efficient algorithms of lock-free deques known. For
systems with low concurrency, the algorithm by Michael shows the best
performance. However, as our algorithm is designed for disjoint accesses, it
performs significantly better on systems with high concurrency and non-uniform
memory architecture
System Description for a Scalable, Fault-Tolerant, Distributed Garbage Collector
We describe an efficient and fault-tolerant algorithm for distributed cyclic
garbage collection. The algorithm imposes few requirements on the local
machines and allows for flexibility in the choice of local collector and
distributed acyclic garbage collector to use with it. We have emphasized
reducing the number and size of network messages without sacrificing the
promptness of collection throughout the algorithm. Our proposed collector is a
variant of back tracing to avoid extensive synchronization between machines. We
have added an explicit forward tracing stage to the standard back tracing stage
and designed a tuned heuristic to reduce the total amount of work done by the
collector. Of particular note is the development of fault-tolerant cooperation
between traces and a heuristic that aggressively reduces the set of suspect
objects.Comment: 47 pages, LaTe
- …