9,561 research outputs found
Lock-free Concurrent Data Structures
Concurrent data structures are the data sharing side of parallel programming.
Data structures give the means to the program to store data, but also provide
operations to the program to access and manipulate these data. These operations
are implemented through algorithms that have to be efficient. In the sequential
setting, data structures are crucially important for the performance of the
respective computation. In the parallel programming setting, their importance
becomes more crucial because of the increased use of data and resource sharing
for utilizing parallelism.
The first and main goal of this chapter is to provide a sufficient background
and intuition to help the interested reader to navigate in the complex research
area of lock-free data structures. The second goal is to offer the programmer
familiarity to the subject that will allow her to use truly concurrent methods.Comment: To appear in "Programming Multi-core and Many-core Computing
Systems", eds. S. Pllana and F. Xhafa, Wiley Series on Parallel and
Distributed Computin
Formal Derivation of Concurrent Garbage Collectors
Concurrent garbage collectors are notoriously difficult to implement
correctly. Previous approaches to the issue of producing correct collectors
have mainly been based on posit-and-prove verification or on the application of
domain-specific templates and transformations. We show how to derive the upper
reaches of a family of concurrent garbage collectors by refinement from a
formal specification, emphasizing the application of domain-independent design
theories and transformations. A key contribution is an extension to the
classical lattice-theoretic fixpoint theorems to account for the dynamics of
concurrent mutation and collection.Comment: 38 pages, 21 figures. The short version of this paper appeared in the
Proceedings of MPC 201
Relating goal scheduling, precedence, and memory management in and-parallel execution of logic programs
The interactions among three important issues involved in the implementation of logic programs in parallel (goal scheduling, precedence, and memory management) are discussed. A simplified, parallel memory management model and an efficient, load-balancing goal scheduling strategy are presented. It is shown how, for systems which support "don't know" non-determinism, special care has to be taken during goal scheduling if the space recovery characteristics
of sequential systems are to be preserved. A solution based on selecting only "newer" goals for execution is described, and an algorithm is proposed for efficiently maintaining and determining precedence relationships and variable ages across parallel goals. It is argued that the proposed schemes and algorithms make it possible to extend the storage performance of sequential systems to parallel execution without the considerable overhead previously associated with it. The results are applicable to a wide class of parallel and coroutining systems, and they represent an efficient alternative to "all heap" or "spaghetti stack" allocation models
Lock-free atom garbage collection for multithreaded Prolog
The runtime system of dynamic languages such as Prolog or Lisp and their
derivatives contain a symbol table, in Prolog often called the atom table. A
simple dynamically resizing hash-table used to be an adequate way to implement
this table. As Prolog becomes fashionable for 24x7 server processes we need to
deal with atom garbage collection and concurrent access to the atom table.
Classical lock-based implementations to ensure consistency of the atom table
scale poorly and a stop-the-world approach to implement atom garbage collection
quickly becomes a bottle-neck, making Prolog unsuitable for soft real-time
applications. In this article we describe a novel implementation for the atom
table using lock-free techniques where the atom-table remains accessible even
during atom garbage collection. Relying only on CAS (Compare And Swap) and not
on external libraries, the implementation is straightforward and portable.
Under consideration for acceptance in TPLP.Comment: Paper presented at the 32nd International Conference on Logic
Programming (ICLP 2016), New York City, USA, 16-21 October 2016, 14 pages,
LaTeX, 4 PDF figure
A semantics and implementation of a causal logic programming language
The increasingly widespread availability of multicore and manycore computers demands new programming languages that make parallel programming dramatically easier and less error prone. This paper describes a semantics for a new class of declarative programming languages that support massive amounts of implicit parallelism
Automated Verification of Practical Garbage Collectors
Garbage collectors are notoriously hard to verify, due to their low-level
interaction with the underlying system and the general difficulty in reasoning
about reachability in graphs. Several papers have presented verified
collectors, but either the proofs were hand-written or the collectors were too
simplistic to use on practical applications. In this work, we present two
mechanically verified garbage collectors, both practical enough to use for
real-world C# benchmarks. The collectors and their associated allocators
consist of x86 assembly language instructions and macro instructions, annotated
with preconditions, postconditions, invariants, and assertions. We used the
Boogie verification generator and the Z3 automated theorem prover to verify
this assembly language code mechanically. We provide measurements comparing the
performance of the verified collector with that of the standard Bartok
collectors on off-the-shelf C# benchmarks, demonstrating their competitiveness
The Lock-free -LSM Relaxed Priority Queue
Priority queues are data structures which store keys in an ordered fashion to
allow efficient access to the minimal (maximal) key. Priority queues are
essential for many applications, e.g., Dijkstra's single-source shortest path
algorithm, branch-and-bound algorithms, and prioritized schedulers.
Efficient multiprocessor computing requires implementations of basic data
structures that can be used concurrently and scale to large numbers of threads
and cores. Lock-free data structures promise superior scalability by avoiding
blocking synchronization primitives, but the \emph{delete-min} operation is an
inherent scalability bottleneck in concurrent priority queues. Recent work has
focused on alleviating this obstacle either by batching operations, or by
relaxing the requirements to the \emph{delete-min} operation.
We present a new, lock-free priority queue that relaxes the \emph{delete-min}
operation so that it is allowed to delete \emph{any} of the smallest
keys, where is a runtime configurable parameter. Additionally, the
behavior is identical to a non-relaxed priority queue for items added and
removed by the same thread. The priority queue is built from a logarithmic
number of sorted arrays in a way similar to log-structured merge-trees. We
experimentally compare our priority queue to recent state-of-the-art lock-free
priority queues, both with relaxed and non-relaxed semantics, showing high
performance and good scalability of our approach.Comment: Short version as ACM PPoPP'15 poste
Threads and Or-Parallelism Unified
One of the main advantages of Logic Programming (LP) is that it provides an
excellent framework for the parallel execution of programs. In this work we
investigate novel techniques to efficiently exploit parallelism from real-world
applications in low cost multi-core architectures. To achieve these goals, we
revive and redesign the YapOr system to exploit or-parallelism based on a
multi-threaded implementation. Our new approach takes full advantage of the
state-of-the-art fast and optimized YAP Prolog engine and shares the underlying
execution environment, scheduler and most of the data structures used to
support YapOr's model. Initial experiments with our new approach consistently
achieve almost linear speedups for most of the applications, proving itself as
a good alternative for exploiting implicit parallelism in the currently
available low cost multi-core architectures.Comment: 17 pages, 21 figures, International Conference on Logic Programming
(ICLP 2010
- …