34,928 research outputs found
Quantitative relaxation of concurrent data structures
There is a trade-off between performance and correctness in implementing concurrent data structures. Better performance may be achieved at the expense of relaxing correctness, by redefining the semantics of data structures. We address such a redefinition of data structure semantics and present a systematic and formal framework for obtaining new data structures by quantitatively relaxing existing ones. We view a data structure as a sequential specification S containing all "legal" sequences over an alphabet of method calls. Relaxing the data structure corresponds to defining a distance from any sequence over the alphabet to the sequential specification: the k-relaxed sequential specification contains all sequences over the alphabet within distance k from the original specification. In contrast to other existing work, our relaxations are semantic (distance in terms of data structure states). As an instantiation of our framework, we present two simple yet generic relaxation schemes, called out-of-order and stuttering relaxation, along with several ways of computing distances. We show that the out-of-order relaxation, when further instantiated to stacks, queues, and priority queues, amounts to tolerating bounded out-of-order behavior, which cannot be captured by a purely syntactic relaxation (distance in terms of sequence manipulation, e.g. edit distance). We give concurrent implementations of relaxed data structures and demonstrate that bounded relaxations provide the means for trading correctness for performance in a controlled way. The relaxations are monotonic which further highlights the trade-off: increasing k increases the number of permitted sequences, which as we demonstrate can lead to better performance. Finally, since a relaxed stack or queue also implements a pool, we actually have new concurrent pool implementations that outperform the state-of-the-art ones
The Lock-free -LSM Relaxed Priority Queue
Priority queues are data structures which store keys in an ordered fashion to
allow efficient access to the minimal (maximal) key. Priority queues are
essential for many applications, e.g., Dijkstra's single-source shortest path
algorithm, branch-and-bound algorithms, and prioritized schedulers.
Efficient multiprocessor computing requires implementations of basic data
structures that can be used concurrently and scale to large numbers of threads
and cores. Lock-free data structures promise superior scalability by avoiding
blocking synchronization primitives, but the \emph{delete-min} operation is an
inherent scalability bottleneck in concurrent priority queues. Recent work has
focused on alleviating this obstacle either by batching operations, or by
relaxing the requirements to the \emph{delete-min} operation.
We present a new, lock-free priority queue that relaxes the \emph{delete-min}
operation so that it is allowed to delete \emph{any} of the smallest
keys, where is a runtime configurable parameter. Additionally, the
behavior is identical to a non-relaxed priority queue for items added and
removed by the same thread. The priority queue is built from a logarithmic
number of sorted arrays in a way similar to log-structured merge-trees. We
experimentally compare our priority queue to recent state-of-the-art lock-free
priority queues, both with relaxed and non-relaxed semantics, showing high
performance and good scalability of our approach.Comment: Short version as ACM PPoPP'15 poste
The Power of Choice in Priority Scheduling
Consider the following random process: we are given queues, into which
elements of increasing labels are inserted uniformly at random. To remove an
element, we pick two queues at random, and remove the element of lower label
(higher priority) among the two. The cost of a removal is the rank of the label
removed, among labels still present in any of the queues, that is, the distance
from the optimal choice at each step. Variants of this strategy are prevalent
in state-of-the-art concurrent priority queue implementations. Nonetheless, it
is not known whether such implementations provide any rank guarantees, even in
a sequential model.
We answer this question, showing that this strategy provides surprisingly
strong guarantees: Although the single-choice process, where we always insert
and remove from a single randomly chosen queue, has degrading cost, going to
infinity as we increase the number of steps, in the two choice process, the
expected rank of a removed element is while the expected worst-case
cost is . These bounds are tight, and hold irrespective of the
number of steps for which we run the process.
The argument is based on a new technical connection between "heavily loaded"
balls-into-bins processes and priority scheduling.
Our analytic results inspire a new concurrent priority queue implementation,
which improves upon the state of the art in terms of practical performance
Hoare-style Specifications as Correctness Conditions for Non-linearizable Concurrent Objects
Designing scalable concurrent objects, which can be efficiently used on
multicore processors, often requires one to abandon standard specification
techniques, such as linearizability, in favor of more relaxed consistency
requirements. However, the variety of alternative correctness conditions makes
it difficult to choose which one to employ in a particular case, and to compose
them when using objects whose behaviors are specified via different criteria.
The lack of syntactic verification methods for most of these criteria poses
challenges in their systematic adoption and application.
In this paper, we argue for using Hoare-style program logics as an
alternative and uniform approach for specification and compositional formal
verification of safety properties for concurrent objects and their client
programs. Through a series of case studies, we demonstrate how an existing
program logic for concurrency can be employed off-the-shelf to capture
important state and history invariants, allowing one to explicitly quantify
over interference of environment threads and provide intuitive and expressive
Hoare-style specifications for several non-linearizable concurrent objects that
were previously specified only via dedicated correctness criteria. We
illustrate the adequacy of our specifications by verifying a number of
concurrent client scenarios, that make use of the previously specified
concurrent objects, capturing the essence of such correctness conditions as
concurrency-aware linearizability, quiescent, and quantitative quiescent
consistency. All examples described in this paper are verified mechanically in
Coq.Comment: 18 page
Relaxed Schedulers Can Efficiently Parallelize Iterative Algorithms
There has been significant progress in understanding the parallelism inherent
to iterative sequential algorithms: for many classic algorithms, the depth of
the dependence structure is now well understood, and scheduling techniques have
been developed to exploit this shallow dependence structure for efficient
parallel implementations. A related, applied research strand has studied
methods by which certain iterative task-based algorithms can be efficiently
parallelized via relaxed concurrent priority schedulers. These allow for high
concurrency when inserting and removing tasks, at the cost of executing
superfluous work due to the relaxed semantics of the scheduler.
In this work, we take a step towards unifying these two research directions,
by showing that there exists a family of relaxed priority schedulers that can
efficiently and deterministically execute classic iterative algorithms such as
greedy maximal independent set (MIS) and matching. Our primary result shows
that, given a randomized scheduler with an expected relaxation factor of in
terms of the maximum allowed priority inversions on a task, and any graph on
vertices, the scheduler is able to execute greedy MIS with only an additive
factor of poly() expected additional iterations compared to an exact (but
not scalable) scheduler. This counter-intuitive result demonstrates that the
overhead of relaxation when computing MIS is not dependent on the input size or
structure of the input graph. Experimental results show that this overhead can
be clearly offset by the gain in performance due to the highly scalable
scheduler. In sum, we present an efficient method to deterministically
parallelize iterative sequential algorithms, with provable runtime guarantees
in terms of the number of executed tasks to completion.Comment: PODC 2018, pages 377-386 in proceeding
Ordering and multiple phase transitions in ultra-thin nickelate superlattices
We interpret via advanced ab initio calculations the multiple phase
transitions observed recently in ultra-thin LaNiO/LaAlO
superlattices. The ground state is insulating, charge-ordered, and
antiferromagnetic due to concurrent structural distortion and weak valency
disproportionation. We infer distinct transitions at 40 K and 150 K,
respectively, from antiferromagnetic order to moment disorder, and from
structurally-dimerized insulator to an undistorted metallic Pauli paramagnet
(exhibiting a cuprate-like Fermi surface). The results are in satisfactory
agreement with experiment.Comment: ~4 pages, 4 figures; further improvement during revie
- …