352,668 research outputs found
Impact of the Consistency Model on Checkpointing of Distributed Shared Memory
In this report, we consider the impact of the consistency model on
checkpointing and rollback algorithms for distributed shared memory. In
particular, we consider specific implementations of four consistency models for
distributed shared memory, namely, linearizability, sequential consistency,
causal consistency and eventual consistency, and develop checkpointing and
rollback algorithms that can be integrated into the implementations of the
consistency models. Our results empirically demonstrate that the mechanisms
used to implement stronger consistency models lead to simpler or more efficient
checkpointing algorithms
Memory Consistency Models
Abstract: The memory consistency model for a shared-memory multiprocessor specifies the behaviour of memory with respect to read and write operations from multiple processors. Relaxed models that impose fewer memory ordering constraints offer the potential for higher performance by allowing hardware and software to overlap and reorder memory operations. Many of the previously proposed models either fail to provide reasonable programming semantics or are biased toward programming ease at the cost of sacrificing performance. The optimizations enabled by relaxed models are extremely effective in hiding virtually the full latency of writes in architectures with blocking reads. We evaluate all the consistency models and the comparison for the weak consistency model and release consistency model, the performance benefits of exploiting relaxed models based on detailed simulations of realistic parallel applications. We believe that the combined benefits in hardware and software will make relaxed models universal in future multiprocessors, as is already evidenced by their adoption in several commercial systems
A Framework for Consistency Algorithms
We present a framework that provides deterministic consistency algorithms for given memory models. Such an algorithm checks whether the executions of a shared-memory concurrent program are consistent under the axioms defined by a model. For memory models like SC and TSO, checking consistency is NP-complete. Our framework shows, that despite the hardness, fast deterministic consistency algorithms can be obtained by employing tools from fine-grained complexity.
The framework is based on a universal consistency problem which can be instantiated by different memory models. We construct an algorithm for the problem running in time ?^*(2^k), where k is the number of write accesses in the execution that is checked for consistency. Each instance of the framework then admits an ?^*(2^k)-time consistency algorithm. By applying the framework, we obtain corresponding consistency algorithms for SC, TSO, PSO, and RMO. Moreover, we show that the obtained algorithms for SC, TSO, and PSO are optimal in the fine-grained sense: there is no consistency algorithm for these running in time 2^{o(k)} unless the exponential time hypothesis fails
Strong Memory Consistency For Parallel Programming
Correctly synchronizing multithreaded programs is challenging, and errors can lead to program failures (e.g., atomicity violations). Existing memory consistency models rule out some possible failures, but are limited by depending on subtle programmer-defined locking code and by providing unintuitive semantics for incorrectly synchronized code. Stronger memory consistency models assist programmers by providing them with easier-to-understand semantics with regard to memory access interleavings in parallel code. This dissertation proposes a new strong memory consistency model based on ordering-free regions (OFRs), which are spans of dynamic instructions between consecutive ordering constructs (e.g. barriers). Atomicity over ordering-free
regions provides stronger atomicity than existing strong memory consistency models with competitive performance. Ordering-free regions also simplify programmer reasoning by limiting the potential for atomicity violations to fewer points in the program’s execution. This dissertation explores both software-only and hardware-supported systems that provide OFR serializability
Property-Driven Fence Insertion using Reorder Bounded Model Checking
Modern architectures provide weaker memory consistency guarantees than
sequential consistency. These weaker guarantees allow programs to exhibit
behaviours where the program statements appear to have executed out of program
order. Fortunately, modern architectures provide memory barriers (fences) to
enforce the program order between a pair of statements if needed. Due to the
intricate semantics of weak memory models, the placement of fences is
challenging even for experienced programmers. Too few fences lead to bugs
whereas overuse of fences results in performance degradation. This motivates
automated placement of fences. Tools that restore sequential consistency in the
program may insert more fences than necessary for the program to be correct.
Therefore, we propose a property-driven technique that introduces
"reorder-bounded exploration" to identify the smallest number of program
locations for fence placement. We implemented our technique on top of CBMC;
however, in principle, our technique is generic enough to be used with any
model checker. Our experimental results show that our technique is faster and
solves more instances of relevant benchmarks as compared to earlier approaches.Comment: 18 pages, 3 figures, 4 algorithms. Version change reason : new set of
results and publication ready version of FM 201
High-Performance Distributed ML at Scale through Parameter Server Consistency Models
As Machine Learning (ML) applications increase in data size and model
complexity, practitioners turn to distributed clusters to satisfy the increased
computational and memory demands. Unfortunately, effective use of clusters for
ML requires considerable expertise in writing distributed code, while
highly-abstracted frameworks like Hadoop have not, in practice, approached the
performance seen in specialized ML implementations. The recent Parameter Server
(PS) paradigm is a middle ground between these extremes, allowing easy
conversion of single-machine parallel ML applications into distributed ones,
while maintaining high throughput through relaxed "consistency models" that
allow inconsistent parameter reads. However, due to insufficient theoretical
study, it is not clear which of these consistency models can really ensure
correct ML algorithm output; at the same time, there remain many
theoretically-motivated but undiscovered opportunities to maximize
computational throughput. Motivated by this challenge, we study both the
theoretical guarantees and empirical behavior of iterative-convergent ML
algorithms in existing PS consistency models. We then use the gleaned insights
to improve a consistency model using an "eager" PS communication mechanism, and
implement it as a new PS system that enables ML algorithms to reach their
solution more quickly.Comment: 19 pages, 2 figure
- …