3,117 research outputs found

    Wait-Free and Obstruction-Free Snapshot

    Get PDF
    The snapshot problem was first proposed over a decade ago and has since been well-studied in the distributed algorithms community. The challenge is to design a data structure consisting of mm components, shared by upto nn concurrent processes, that supports two operations. The first, Update(i,v)Update(i,v), atomically writes vv to the iith component. The second, Scan()Scan(), returns an atomic snapshot of all mm components. We consider two termination properties: wait-freedom, which requires a process to always terminate in a bounded number of its own steps, and the weaker obstruction-freedom, which requires such termination only for processes that eventually execute uninterrupted. First, we present a simple, time and space optimal, obstruction-free solution to the single-writer, multi-scanner version of the snapshot problem (wherein concurrent Updates never occur on the same component). Second, we assume hardware support for compare&swap (CAS) to give a time-optimal, wait-free solution to the multi-writer, single-scanner snapshot problem (wherein concurrent Scans never occur). This algorithm uses only O(mn)O(mn) space and has optimal CAS, write and remote-reference complexities. Additionally, it can be augmented to implement a general snapshot object with the same time and space bounds, thus improving the space complexity of O(mn2)O(mn^2) of the only previously known time-optimal solution

    Anonymous Obstruction-free (n,k)(n,k)-Set Agreement with n−k+1n-k+1 Atomic Read/Write Registers

    Get PDF
    The kk-set agreement problem is a generalization of the consensus problem. Namely, assuming each process proposes a value, each non-faulty process has to decide a value such that each decided value was proposed, and no more than kk different values are decided. This is a hard problem in the sense that it cannot be solved in asynchronous systems as soon as kk or more processes may crash. One way to circumvent this impossibility consists in weakening its termination property, requiring that a process terminates (decides) only if it executes alone during a long enough period. This is the well-known obstruction-freedom progress condition. Considering a system of nn {\it anonymous asynchronous} processes, which communicate through atomic {\it read/write registers only}, and where {\it any number of processes may crash}, this paper addresses and solves the challenging open problem of designing an obstruction-free kk-set agreement algorithm with (n−k+1)(n-k+1) atomic registers only. From a shared memory cost point of view, this algorithm is the best algorithm known so far, thereby establishing a new upper bound on the number of registers needed to solve the problem (its gain is (n−k)(n-k) with respect to the previous upper bound). The algorithm is then extended to address the repeated version of (n,k)(n,k)-set agreement. As it is optimal in the number of atomic read/write registers, this algorithm closes the gap on previously established lower/upper bounds for both the anonymous and non-anonymous versions of the repeated (n,k)(n,k)-set agreement problem. Finally, for 1 \leq x\leq k \textless{} n, a generalization suited to xx-obstruction-freedom is also described, which requires (n−k+x)(n-k+x) atomic registers only

    Compositional competitiveness for distributed algorithms

    Full text link
    We define a measure of competitive performance for distributed algorithms based on throughput, the number of tasks that an algorithm can carry out in a fixed amount of work. This new measure complements the latency measure of Ajtai et al., which measures how quickly an algorithm can finish tasks that start at specified times. The novel feature of the throughput measure, which distinguishes it from the latency measure, is that it is compositional: it supports a notion of algorithms that are competitive relative to a class of subroutines, with the property that an algorithm that is k-competitive relative to a class of subroutines, combined with an l-competitive member of that class, gives a combined algorithm that is kl-competitive. In particular, we prove the throughput-competitiveness of a class of algorithms for collect operations, in which each of a group of n processes obtains all values stored in an array of n registers. Collects are a fundamental building block of a wide variety of shared-memory distributed algorithms, and we show that several such algorithms are competitive relative to collects. Inserting a competitive collect in these algorithms gives the first examples of competitive distributed algorithms obtained by composition using a general construction.Comment: 33 pages, 2 figures; full version of STOC 96 paper titled "Modular competitiveness for distributed algorithms.

    Byzantine-Tolerant Set-Constrained Delivery Broadcast

    Get PDF
    Set-Constrained Delivery Broadcast (SCD-broadcast), recently introduced at ICDCN 2018, is a high-level communication abstraction that captures ordering properties not between individual messages but between sets of messages. More precisely, it allows processes to broadcast messages and deliver sets of messages, under the constraint that if a process delivers a set containing a message m before a set containing a message m\u27, then no other process delivers first a set containing m\u27 and later a set containing m. It has been shown that SCD-broadcast and read/write registers are computationally equivalent, and an algorithm implementing SCD-broadcast is known in the context of asynchronous message passing systems prone to crash failures. This paper introduces a Byzantine-tolerant SCD-broadcast algorithm, which we call BSCD-broadcast. Our proposed algorithm assumes an underlying basic Byzantine-tolerant reliable broadcast abstraction. We first introduce an intermediary communication primitive, Byzantine FIFO broadcast (BFIFO-broadcast), which we then use as a primitive in our final BSCD-broadcast algorithm. Unlike the original SCD-broadcast algorithm that is tolerant to up to t<n/2 crashing processes, and unlike the underlying Byzantine reliable broadcast primitive that is tolerant to up to t<n/3 Byzantine processes, our BSCD-broadcast algorithm is tolerant to up to t<n/4 Byzantine processes. As an illustration of the high abstraction power provided by the BSCD-broadcast primitive, we show that it can be used to implement a Byzantine-tolerant read/write snapshot object in an extremely simple way

    Concurrent Data Structures Linked in Time

    Get PDF
    Arguments about correctness of a concurrent data structure are typically carried out by using the notion of linearizability and specifying the linearization points of the data structure's procedures. Such arguments are often cumbersome as the linearization points' position in time can be dynamic (depend on the interference, run-time values and events from the past, or even future), non-local (appear in procedures other than the one considered), and whose position in the execution trace may only be determined after the considered procedure has already terminated. In this paper we propose a new method, based on a separation-style logic, for reasoning about concurrent objects with such linearization points. We embrace the dynamic nature of linearization points, and encode it as part of the data structure's auxiliary state, so that it can be dynamically modified in place by auxiliary code, as needed when some appropriate run-time event occurs. We name the idea linking-in-time, because it reduces temporal reasoning to spatial reasoning. For example, modifying a temporal position of a linearization point can be modeled similarly to a pointer update in separation logic. Furthermore, the auxiliary state provides a convenient way to concisely express the properties essential for reasoning about clients of such concurrent objects. We illustrate the method by verifying (mechanically in Coq) an intricate optimal snapshot algorithm due to Jayanti, as well as some clients

    PaRiS: Causally Consistent Transactions with Non-blocking Reads and Partial Replication

    Get PDF
    Geo-replicated data platforms are at the backbone of several large-scale online services. Transactional Causal Consistency (TCC) is an attractive consistency level for building such platforms. TCC avoids many anomalies of eventual consistency, eschews the synchronization costs of strong consistency, and supports interactive read-write transactions. Partial replication is another attractive design choice for building geo-replicated platforms, as it increases the storage capacity and reduces update propagation costs. This paper presents PaRiS, the first TCC system that supports partial replication and implements non-blocking parallel read operations, whose latency is paramount for the performance of read-intensive applications. PaRiS relies on a novel protocol to track dependencies, called Universal Stable Time (UST). By means of a lightweight background gossip process, UST identifies a snapshot of the data that has been installed by every DC in the system. Hence, transactions can consistently read from such a snapshot on any server in any replication site without having to block. Moreover, PaRiS requires only one timestamp to track dependencies and define transactional snapshots, thereby achieving resource efficiency and scalability. We evaluate PaRiS on a large-scale AWS deployment composed of up to 10 replication sites. We show that PaRiS scales well with the number of DCs and partitions, while being able to handle larger data-sets than existing solutions that assume full replication. We also demonstrate a performance gain of non-blocking reads vs. a blocking alternative (up to 1.47x higher throughput with 5.91x lower latency for read-dominated workloads and up to 1.46x higher throughput with 20.56x lower latency for write-heavy workloads)

    Progress-Space Tradeoffs in Single-Writer Memory Implementations

    Get PDF
    Many algorithms designed for shared-memory distributed systems assume the single-writer multi- reader (SWMR) setting where each process is provided with a unique register that can only be written by the process and read by all. In a system where computation is performed by a bounded number n of processes coming from a large (possibly unbounded) set of potential participants, the assumption of an SWMR memory is no longer reasonable. If only a bounded number of multi- writer multi-reader (MWMR) registers are provided, we cannot rely on an a priori assignment of processes to registers. In this setting, implementing an SWMR memory, or equivalently, ensuring stable writes (i.e., every written value persists in the memory), is desirable. In this paper, we propose an SWMR implementation that adapts the number of MWMR registers used to the desired progress condition. For any given k from 1 to n, we present an algorithm that uses n + k ? 1 registers to implement a k-lock-free SWMR memory. In the special case of 2-lock-freedom, we also give a matching lower bound of n + 1 registers, which supports our conjecture that the algorithm is space-optimal. Our lower bound holds for the strictly weaker progress condition of 2-obstruction-freedom, which suggests that the space complexity for k-obstruction-free and k-lock-free SWMR implementations might coincide

    Randomized protocols for asynchronous consensus

    Full text link
    The famous Fischer, Lynch, and Paterson impossibility proof shows that it is impossible to solve the consensus problem in a natural model of an asynchronous distributed system if even a single process can fail. Since its publication, two decades of work on fault-tolerant asynchronous consensus algorithms have evaded this impossibility result by using extended models that provide (a) randomization, (b) additional timing assumptions, (c) failure detectors, or (d) stronger synchronization mechanisms than are available in the basic model. Concentrating on the first of these approaches, we illustrate the history and structure of randomized asynchronous consensus protocols by giving detailed descriptions of several such protocols.Comment: 29 pages; survey paper written for PODC 20th anniversary issue of Distributed Computin
    • …
    corecore