2,192 research outputs found
Randomized protocols for asynchronous consensus
The famous Fischer, Lynch, and Paterson impossibility proof shows that it is
impossible to solve the consensus problem in a natural model of an asynchronous
distributed system if even a single process can fail. Since its publication,
two decades of work on fault-tolerant asynchronous consensus algorithms have
evaded this impossibility result by using extended models that provide (a)
randomization, (b) additional timing assumptions, (c) failure detectors, or (d)
stronger synchronization mechanisms than are available in the basic model.
Concentrating on the first of these approaches, we illustrate the history and
structure of randomized asynchronous consensus protocols by giving detailed
descriptions of several such protocols.Comment: 29 pages; survey paper written for PODC 20th anniversary issue of
Distributed Computin
The Impact of RDMA on Agreement
Remote Direct Memory Access (RDMA) is becoming widely available in data
centers. This technology allows a process to directly read and write the memory
of a remote host, with a mechanism to control access permissions. In this
paper, we study the fundamental power of these capabilities. We consider the
well-known problem of achieving consensus despite failures, and find that RDMA
can improve the inherent trade-off in distributed computing between failure
resilience and performance. Specifically, we show that RDMA allows algorithms
that simultaneously achieve high resilience and high performance, while
traditional algorithms had to choose one or another. With Byzantine failures,
we give an algorithm that only requires processes (where
is the maximum number of faulty processes) and decides in two (network)
delays in common executions. With crash failures, we give an algorithm that
only requires processes and also decides in two delays. Both
algorithms tolerate a minority of memory failures inherent to RDMA, and they
provide safety in asynchronous systems and liveness with standard additional
assumptions.Comment: Full version of PODC'19 paper, strengthened broadcast algorith
The Failure Detector Abstraction
This paper surveys the failure detector concept through two dimensions. First we study failure detectors as building blocks to simplify the design of reliable distributed algorithms. More specifically, we illustrate how failure detectors can factor out timing assumptions to detect failures in distributed agreement algorithms. Second, we study failure detectors as computability benchmarks. That is, we survey the weakest failure detector question and illustrate how failure detectors can be used to classify problems. We also highlights some limitations of the failure detector abstraction along each of the dimensions
Distributed eventual leader election in the crash-recovery and general omission failure models.
102 p.Distributed applications are present in many aspects of everyday life. Banking, healthcare or transportation are examples of such applications. These applications are built on top of distributed systems. Roughly speaking, a distributed system is composed of a set of processes that collaborate among them to achieve a common goal. When building such systems, designers have to cope with several issues, such as different synchrony assumptions and failure occurrence. Distributed systems must ensure that the delivered service is trustworthy.Agreement problems compose a fundamental class of problems in distributed systems. All agreement problems follow the same pattern: all processes must agree on some common decision. Most of the agreement problems can be considered as a particular instance of the Consensus problem. Hence, they can be solved by reduction to consensus. However, a fundamental impossibility result, namely (FLP), states that in an asynchronous distributed system it is impossible to achieve consensus deterministically when at least one process may fail. A way to circumvent this obstacle is by using unreliable failure detectors. A failure detector allows to encapsulate synchrony assumptions of the system, providing (possibly incorrect) information about process failures. A particular failure detector, called Omega, has been shown to be the weakest failure detector for solving consensus with a majority of correct processes. Informally, Omega lies on providing an eventual leader election mechanism
Implementing the weakest failure detector for solving consensus
The concept of unreliable failure detector was introduced by Chandra and Toueg as a mechanism that provides information about process failures. This mechanism has been used to solve several agreement problems, such as the consensus problem. In this paper, algorithms that implement failure detectors in partially synchronous systems are presented. First two simple algorithms of the weakest class to solve the consensus problem, namely the Eventually Strong class (⋄S), are presented. While the first algorithm is wait-free, the second algorithm is f-resilient, where f is a known upper bound on the number of faulty processes. Both algorithms guarantee that, eventually, all the correct processes agree permanently on a common correct process, i.e. they also implement a failure detector of the class Omega (Ω). They are also shown to be optimal in terms of the number of communication links used forever. Additionally, a wait-free algorithm that implements a failure detector of the Eventually Perfect class (⋄P) is presented. This algorithm is shown to be optimal in terms of the number of bidirectional links used forever
Consensus in Byzantine asynchronous systems
AbstractThis paper presents a consensus protocol resilient to Byzantine failures. It uses signed and certified messages and is based on two underlying failure detection modules. The first is a muteness failure detection module of the class ♢M. The second is a reliable Byzantine behaviour detection module. More precisely, the first module detects processes that stop sending messages, while processes experiencing other non-correct behaviours (i.e., Byzantine) are detected by the second module. The protocol is resilient to F faulty processes, F⩽min(⌊(n−1)/2⌋,C) (where C is the maximum number of faulty processes that can be tolerated by the underlying certification service).The approach used to design the protocol is new. While usual Byzantine consensus protocols are based on failure detectors to detect processes that stop communicating, none of them use a module to detect their Byzantine behaviour (this detection is not isolated from the protocol and makes it difficult to understand and prove correct). In addition to this modular approach and to a consensus protocol for Byzantine systems, the paper presents a finite state automaton-based implementation of the Byzantine behaviour detection module. Finally, the modular approach followed in this paper can be used to solve other problems in asynchronous systems experiencing Byzantine failures
The power and limit of adding synchronization messages for synchronous agreement
2006-2007 > Academic research: refereed > Refereed conference paperVersion of RecordPublishe
- …