85,447 research outputs found
Distributed Transactions: Dissecting the Nightmare
Many distributed storage systems are transactional and a lot of work has been
devoted to optimizing their performance, especially the performance of
read-only transactions that are considered the most frequent in practice. Yet,
the results obtained so far are rather disappointing, and some of the design
decisions seem contrived. This paper contributes to explaining this state of
affairs by proving intrinsic limitations of transactional storage systems, even
those that need not ensure strong consistency but only causality.
We first consider general storage systems where some transactions are
read-only and some also involve write operations. We show that even read-only
transactions cannot be "fast": their operations cannot be executed within one
round-trip message exchange between a client seeking an object and the server
storing it. We then consider systems (as sometimes implemented today) where all
transactions are read-only, i.e., updates are performed as individual
operations outside transactions. In this case, read-only transactions can
indeed be "fast", but we prove that they need to be "visible". They induce
inherent updates on the servers, which in turn impact their overall
performance
On Verifying Causal Consistency
Causal consistency is one of the most adopted consistency criteria for
distributed implementations of data structures. It ensures that operations are
executed at all sites according to their causal precedence. We address the
issue of verifying automatically whether the executions of an implementation of
a data structure are causally consistent. We consider two problems: (1)
checking whether one single execution is causally consistent, which is relevant
for developing testing and bug finding algorithms, and (2) verifying whether
all the executions of an implementation are causally consistent.
We show that the first problem is NP-complete. This holds even for the
read-write memory abstraction, which is a building block of many modern
distributed systems. Indeed, such systems often store data in key-value stores,
which are instances of the read-write memory abstraction. Moreover, we prove
that, surprisingly, the second problem is undecidable, and again this holds
even for the read-write memory abstraction. However, we show that for the
read-write memory abstraction, these negative results can be circumvented if
the implementations are data independent, i.e., their behaviors do not depend
on the data values that are written or read at each moment, which is a
realistic assumption.Comment: extended version of POPL 201
Causal Consistency: Beyond Memory
In distributed systems where strong consistency is costly when not
impossible, causal consistency provides a valuable abstraction to represent
program executions as partial orders. In addition to the sequential program
order of each computing entity, causal order also contains the semantic links
between the events that affect the shared objects -- messages emission and
reception in a communication channel , reads and writes on a shared register.
Usual approaches based on semantic links are very difficult to adapt to other
data types such as queues or counters because they require a specific analysis
of causal dependencies for each data type. This paper presents a new approach
to define causal consistency for any abstract data type based on sequential
specifications. It explores, formalizes and studies the differences between
three variations of causal consistency and highlights them in the light of
PRAM, eventual consistency and sequential consistency: weak causal consistency,
that captures the notion of causality preservation when focusing on convergence
; causal convergence that mixes weak causal consistency and convergence; and
causal consistency, that coincides with causal memory when applied to shared
memory.Comment: 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel
Programming, Mar 2016, Barcelone, Spai
PaRiS: Causally Consistent Transactions with Non-blocking Reads and Partial Replication
Geo-replicated data platforms are at the backbone of several large-scale
online services. Transactional Causal Consistency (TCC) is an attractive
consistency level for building such platforms. TCC avoids many anomalies of
eventual consistency, eschews the synchronization costs of strong consistency,
and supports interactive read-write transactions. Partial replication is
another attractive design choice for building geo-replicated platforms, as it
increases the storage capacity and reduces update propagation costs. This paper
presents PaRiS, the first TCC system that supports partial replication and
implements non-blocking parallel read operations, whose latency is paramount
for the performance of read-intensive applications. PaRiS relies on a novel
protocol to track dependencies, called Universal Stable Time (UST). By means of
a lightweight background gossip process, UST identifies a snapshot of the data
that has been installed by every DC in the system. Hence, transactions can
consistently read from such a snapshot on any server in any replication site
without having to block. Moreover, PaRiS requires only one timestamp to track
dependencies and define transactional snapshots, thereby achieving resource
efficiency and scalability. We evaluate PaRiS on a large-scale AWS deployment
composed of up to 10 replication sites. We show that PaRiS scales well with the
number of DCs and partitions, while being able to handle larger data-sets than
existing solutions that assume full replication. We also demonstrate a
performance gain of non-blocking reads vs. a blocking alternative (up to 1.47x
higher throughput with 5.91x lower latency for read-dominated workloads and up
to 1.46x higher throughput with 20.56x lower latency for write-heavy
workloads)
Fisheye Consistency: Keeping Data in Synch in a Georeplicated World
Over the last thirty years, numerous consistency conditions for replicated
data have been proposed and implemented. Popular examples of such conditions
include linearizability (or atomicity), sequential consistency, causal
consistency, and eventual consistency. These consistency conditions are usually
defined independently from the computing entities (nodes) that manipulate the
replicated data; i.e., they do not take into account how computing entities
might be linked to one another, or geographically distributed. To address this
lack, as a first contribution, this paper introduces the notion of proximity
graph between computing nodes. If two nodes are connected in this graph, their
operations must satisfy a strong consistency condition, while the operations
invoked by other nodes are allowed to satisfy a weaker condition. The second
contribution is the use of such a graph to provide a generic approach to the
hybridization of data consistency conditions into the same system. We
illustrate this approach on sequential consistency and causal consistency, and
present a model in which all data operations are causally consistent, while
operations by neighboring processes in the proximity graph are sequentially
consistent. The third contribution of the paper is the design and the proof of
a distributed algorithm based on this proximity graph, which combines
sequential consistency and causal consistency (the resulting condition is
called fisheye consistency). In doing so the paper not only extends the domain
of consistency conditions, but provides a generic provably correct solution of
direct relevance to modern georeplicated systems
Okapi: Causally Consistent Geo-Replication Made Faster, Cheaper and More Available
Okapi is a new causally consistent geo-replicated key- value store. Okapi
leverages two key design choices to achieve high performance. First, it relies
on hybrid logical/physical clocks to achieve low latency even in the presence
of clock skew. Second, Okapi achieves higher resource efficiency and better
availability, at the expense of a slight increase in update visibility latency.
To this end, Okapi implements a new stabilization protocol that uses a
combination of vector and scalar clocks and makes a remote update visible when
its delivery has been acknowledged by every data center. We evaluate Okapi with
different workloads on Amazon AWS, using three geographically distributed
regions and 96 nodes. We compare Okapi with two recent approaches to causal
consistency, Cure and GentleRain. We show that Okapi delivers up to two orders
of magnitude better performance than GentleRain and that Okapi achieves up to
3.5x lower latency and a 60% reduction of the meta-data overhead with respect
to Cure
- âŠ