3,117 research outputs found
Wait-Free and Obstruction-Free Snapshot
The snapshot problem was first proposed over a decade ago and has since been well-studied in the distributed algorithms community. The challenge is to design a data structure consisting of components, shared by upto concurrent processes, that supports two operations. The first, , atomically writes to the th component. The second, , returns an atomic snapshot of all components. We consider two termination properties: wait-freedom, which requires a process to always terminate in a bounded number of its own steps, and the weaker obstruction-freedom, which requires such termination only for processes that eventually execute uninterrupted. First, we present a simple, time and space optimal, obstruction-free solution to the single-writer, multi-scanner version of the snapshot problem (wherein concurrent Updates never occur on the same component). Second, we assume hardware support for compare&swap (CAS) to give a time-optimal, wait-free solution to the multi-writer, single-scanner snapshot problem (wherein concurrent Scans never occur). This algorithm uses only space and has optimal CAS, write and remote-reference complexities. Additionally, it can be augmented to implement a general snapshot object with the same time and space bounds, thus improving the space complexity of of the only previously known time-optimal solution
Anonymous Obstruction-free -Set Agreement with Atomic Read/Write Registers
The -set agreement problem is a generalization of the consensus problem.
Namely, assuming each process proposes a value, each non-faulty process has to
decide a value such that each decided value was proposed, and no more than
different values are decided. This is a hard problem in the sense that it
cannot be solved in asynchronous systems as soon as or more processes may
crash. One way to circumvent this impossibility consists in weakening its
termination property, requiring that a process terminates (decides) only if it
executes alone during a long enough period. This is the well-known
obstruction-freedom progress condition. Considering a system of {\it
anonymous asynchronous} processes, which communicate through atomic {\it
read/write registers only}, and where {\it any number of processes may crash},
this paper addresses and solves the challenging open problem of designing an
obstruction-free -set agreement algorithm with atomic registers
only. From a shared memory cost point of view, this algorithm is the best
algorithm known so far, thereby establishing a new upper bound on the number of
registers needed to solve the problem (its gain is with respect to the
previous upper bound). The algorithm is then extended to address the repeated
version of -set agreement. As it is optimal in the number of atomic
read/write registers, this algorithm closes the gap on previously established
lower/upper bounds for both the anonymous and non-anonymous versions of the
repeated -set agreement problem. Finally, for 1 \leq x\leq k
\textless{} n, a generalization suited to -obstruction-freedom is also
described, which requires atomic registers only
Compositional competitiveness for distributed algorithms
We define a measure of competitive performance for distributed algorithms
based on throughput, the number of tasks that an algorithm can carry out in a
fixed amount of work. This new measure complements the latency measure of Ajtai
et al., which measures how quickly an algorithm can finish tasks that start at
specified times. The novel feature of the throughput measure, which
distinguishes it from the latency measure, is that it is compositional: it
supports a notion of algorithms that are competitive relative to a class of
subroutines, with the property that an algorithm that is k-competitive relative
to a class of subroutines, combined with an l-competitive member of that class,
gives a combined algorithm that is kl-competitive.
In particular, we prove the throughput-competitiveness of a class of
algorithms for collect operations, in which each of a group of n processes
obtains all values stored in an array of n registers. Collects are a
fundamental building block of a wide variety of shared-memory distributed
algorithms, and we show that several such algorithms are competitive relative
to collects. Inserting a competitive collect in these algorithms gives the
first examples of competitive distributed algorithms obtained by composition
using a general construction.Comment: 33 pages, 2 figures; full version of STOC 96 paper titled "Modular
competitiveness for distributed algorithms.
Byzantine-Tolerant Set-Constrained Delivery Broadcast
Set-Constrained Delivery Broadcast (SCD-broadcast), recently introduced at ICDCN 2018, is a high-level communication abstraction that captures ordering properties not between individual messages but between sets of messages. More precisely, it allows processes to broadcast messages and deliver sets of messages, under the constraint that if a process delivers a set containing a message m before a set containing a message m\u27, then no other process delivers first a set containing m\u27 and later a set containing m. It has been shown that SCD-broadcast and read/write registers are computationally equivalent, and an algorithm implementing SCD-broadcast is known in the context of asynchronous message passing systems prone to crash failures.
This paper introduces a Byzantine-tolerant SCD-broadcast algorithm, which we call BSCD-broadcast. Our proposed algorithm assumes an underlying basic Byzantine-tolerant reliable broadcast abstraction. We first introduce an intermediary communication primitive, Byzantine FIFO broadcast (BFIFO-broadcast), which we then use as a primitive in our final BSCD-broadcast algorithm. Unlike the original SCD-broadcast algorithm that is tolerant to up to t<n/2 crashing processes, and unlike the underlying Byzantine reliable broadcast primitive that is tolerant to up to t<n/3 Byzantine processes, our BSCD-broadcast algorithm is tolerant to up to t<n/4 Byzantine processes. As an illustration of the high abstraction power provided by the BSCD-broadcast primitive, we show that it can be used to implement a Byzantine-tolerant read/write snapshot object in an extremely simple way
Concurrent Data Structures Linked in Time
Arguments about correctness of a concurrent data structure are typically
carried out by using the notion of linearizability and specifying the
linearization points of the data structure's procedures. Such arguments are
often cumbersome as the linearization points' position in time can be dynamic
(depend on the interference, run-time values and events from the past, or even
future), non-local (appear in procedures other than the one considered), and
whose position in the execution trace may only be determined after the
considered procedure has already terminated.
In this paper we propose a new method, based on a separation-style logic, for
reasoning about concurrent objects with such linearization points. We embrace
the dynamic nature of linearization points, and encode it as part of the data
structure's auxiliary state, so that it can be dynamically modified in place by
auxiliary code, as needed when some appropriate run-time event occurs. We name
the idea linking-in-time, because it reduces temporal reasoning to spatial
reasoning. For example, modifying a temporal position of a linearization point
can be modeled similarly to a pointer update in separation logic. Furthermore,
the auxiliary state provides a convenient way to concisely express the
properties essential for reasoning about clients of such concurrent objects. We
illustrate the method by verifying (mechanically in Coq) an intricate optimal
snapshot algorithm due to Jayanti, as well as some clients
PaRiS: Causally Consistent Transactions with Non-blocking Reads and Partial Replication
Geo-replicated data platforms are at the backbone of several large-scale
online services. Transactional Causal Consistency (TCC) is an attractive
consistency level for building such platforms. TCC avoids many anomalies of
eventual consistency, eschews the synchronization costs of strong consistency,
and supports interactive read-write transactions. Partial replication is
another attractive design choice for building geo-replicated platforms, as it
increases the storage capacity and reduces update propagation costs. This paper
presents PaRiS, the first TCC system that supports partial replication and
implements non-blocking parallel read operations, whose latency is paramount
for the performance of read-intensive applications. PaRiS relies on a novel
protocol to track dependencies, called Universal Stable Time (UST). By means of
a lightweight background gossip process, UST identifies a snapshot of the data
that has been installed by every DC in the system. Hence, transactions can
consistently read from such a snapshot on any server in any replication site
without having to block. Moreover, PaRiS requires only one timestamp to track
dependencies and define transactional snapshots, thereby achieving resource
efficiency and scalability. We evaluate PaRiS on a large-scale AWS deployment
composed of up to 10 replication sites. We show that PaRiS scales well with the
number of DCs and partitions, while being able to handle larger data-sets than
existing solutions that assume full replication. We also demonstrate a
performance gain of non-blocking reads vs. a blocking alternative (up to 1.47x
higher throughput with 5.91x lower latency for read-dominated workloads and up
to 1.46x higher throughput with 20.56x lower latency for write-heavy
workloads)
Progress-Space Tradeoffs in Single-Writer Memory Implementations
Many algorithms designed for shared-memory distributed systems assume the single-writer multi- reader (SWMR) setting where each process is provided with a unique register that can only be written by the process and read by all. In a system where computation is performed by a bounded number n of processes coming from a large (possibly unbounded) set of potential participants, the assumption of an SWMR memory is no longer reasonable. If only a bounded number of multi- writer multi-reader (MWMR) registers are provided, we cannot rely on an a priori assignment of processes to registers. In this setting, implementing an SWMR memory, or equivalently, ensuring stable writes (i.e., every written value persists in the memory), is desirable.
In this paper, we propose an SWMR implementation that adapts the number of MWMR registers used to the desired progress condition. For any given k from 1 to n, we present an algorithm that uses n + k ? 1 registers to implement a k-lock-free SWMR memory. In the special case of 2-lock-freedom, we also give a matching lower bound of n + 1 registers, which supports our conjecture that the algorithm is space-optimal. Our lower bound holds for the strictly weaker progress condition of 2-obstruction-freedom, which suggests that the space complexity for k-obstruction-free and k-lock-free SWMR implementations might coincide
Randomized protocols for asynchronous consensus
The famous Fischer, Lynch, and Paterson impossibility proof shows that it is
impossible to solve the consensus problem in a natural model of an asynchronous
distributed system if even a single process can fail. Since its publication,
two decades of work on fault-tolerant asynchronous consensus algorithms have
evaded this impossibility result by using extended models that provide (a)
randomization, (b) additional timing assumptions, (c) failure detectors, or (d)
stronger synchronization mechanisms than are available in the basic model.
Concentrating on the first of these approaches, we illustrate the history and
structure of randomized asynchronous consensus protocols by giving detailed
descriptions of several such protocols.Comment: 29 pages; survey paper written for PODC 20th anniversary issue of
Distributed Computin
- …