Search CORE

2,192 research outputs found

Randomized protocols for asynchronous consensus

Author: Aspnes James
Publication venue
Publication date: 06/09/2002
Field of study

The famous Fischer, Lynch, and Paterson impossibility proof shows that it is impossible to solve the consensus problem in a natural model of an asynchronous distributed system if even a single process can fail. Since its publication, two decades of work on fault-tolerant asynchronous consensus algorithms have evaded this impossibility result by using extended models that provide (a) randomization, (b) additional timing assumptions, (c) failure detectors, or (d) stronger synchronization mechanisms than are available in the basic model. Concentrating on the first of these approaches, we illustrate the history and structure of randomized asynchronous consensus protocols by giving detailed descriptions of several such protocols.Comment: 29 pages; survey paper written for PODC 20th anniversary issue of Distributed Computin

arXiv.org e-Print Archive

CiteSeerX

The Impact of RDMA on Agreement

Author: Aguilera Marcos K.
Ben-David Naama
Guerraoui Rachid
Marathe Virendra
Zablotchi Igor
Publication venue
Publication date: 03/03/2020
Field of study

Remote Direct Memory Access (RDMA) is becoming widely available in data centers. This technology allows a process to directly read and write the memory of a remote host, with a mechanism to control access permissions. In this paper, we study the fundamental power of these capabilities. We consider the well-known problem of achieving consensus despite failures, and find that RDMA can improve the inherent trade-off in distributed computing between failure resilience and performance. Specifically, we show that RDMA allows algorithms that simultaneously achieve high resilience and high performance, while traditional algorithms had to choose one or another. With Byzantine failures, we give an algorithm that only requires

n \geq 2f_P + 1

processes (where

f_P

is the maximum number of faulty processes) and decides in two (network) delays in common executions. With crash failures, we give an algorithm that only requires

n \geq f_P + 1

processes and also decides in two delays. Both algorithms tolerate a minority of memory failures inherent to RDMA, and they provide safety in asynchronous systems and liveness with standard additional assumptions.Comment: Full version of PODC'19 paper, strengthened broadcast algorith

arXiv.org e-Print Archive

Infoscience - École polytechnique fédérale de Lausanne

The Failure Detector Abstraction

Author: Freiling Felix
Guerraoui Rachid
Kouznetsov Petr
Publication venue
Publication date: 01/01/2006
Field of study

This paper surveys the failure detector concept through two dimensions. First we study failure detectors as building blocks to simplify the design of reliable distributed algorithms. More specifically, we illustrate how failure detectors can factor out timing assumptions to detect failures in distributed agreement algorithms. Second, we study failure detectors as computability benchmarks. That is, we survey the weakest failure detector question and illustrate how failure detectors can be used to classify problems. We also highlights some limitations of the failure detector abstraction along each of the dimensions

MAnnheim DOCument Server

Distributed eventual leader election in the crash-recovery and general omission failure models.

Author: Fernández Campusano Christian
Publication venue
Publication date: 24/01/2020
Field of study

102 p.Distributed applications are present in many aspects of everyday life. Banking, healthcare or transportation are examples of such applications. These applications are built on top of distributed systems. Roughly speaking, a distributed system is composed of a set of processes that collaborate among them to achieve a common goal. When building such systems, designers have to cope with several issues, such as different synchrony assumptions and failure occurrence. Distributed systems must ensure that the delivered service is trustworthy.Agreement problems compose a fundamental class of problems in distributed systems. All agreement problems follow the same pattern: all processes must agree on some common decision. Most of the agreement problems can be considered as a particular instance of the Consensus problem. Hence, they can be solved by reduction to consensus. However, a fundamental impossibility result, namely (FLP), states that in an asynchronous distributed system it is impossible to achieve consensus deterministically when at least one process may fail. A way to circumvent this obstacle is by using unreliable failure detectors. A failure detector allows to encapsulate synchrony assumptions of the system, providing (possibly incorrect) information about process failures. A particular failure detector, called Omega, has been shown to be the weakest failure detector for solving consensus with a majority of correct processes. Informally, Omega lies on providing an eventual leader election mechanism

Archivo Digital para la Docencia y la Investigación

Implementing the weakest failure detector for solving consensus

Author: Aguilera M.
Aguilera M.
Aguilera M.
Antonio Fernández Anta
Bertier M.
Fernández A.
Hadzilacos V.
Hayashibara N.
Larrea M.
Mikel Larrea
Mostéfaoui A.
Mostéfaoui A.
Sergio Arévalo
Publication venue: 'Informa UK Limited'
Publication date: 01/01/2013
Field of study

The concept of unreliable failure detector was introduced by Chandra and Toueg as a mechanism that provides information about process failures. This mechanism has been used to solve several agreement problems, such as the consensus problem. In this paper, algorithms that implement failure detectors in partially synchronous systems are presented. First two simple algorithms of the weakest class to solve the consensus problem, namely the Eventually Strong class (⋄S), are presented. While the first algorithm is wait-free, the second algorithm is f-resilient, where f is a known upper bound on the number of faulty processes. Both algorithms guarantee that, eventually, all the correct processes agree permanently on a common correct process, i.e. they also implement a failure detector of the class Omega (Ω). They are also shown to be optimal in terms of the number of communication links used forever. Additionally, a wait-free algorithm that implements a failure detector of the Eventually Perfect class (⋄P) is presented. This algorithm is shown to be optimal in terms of the number of bidirectional links used forever

Crossref

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Archivo Digital UPM

Consensus in Byzantine asynchronous systems

Author: Baldoni Roberto
Hélary Jean-Michel
Raynal Michel
Tangui Lenaik
Publication venue: Elsevier B.V.
Publication date: 30/04/2003
Field of study

AbstractThis paper presents a consensus protocol resilient to Byzantine failures. It uses signed and certified messages and is based on two underlying failure detection modules. The first is a muteness failure detection module of the class ♢M. The second is a reliable Byzantine behaviour detection module. More precisely, the first module detects processes that stop sending messages, while processes experiencing other non-correct behaviours (i.e., Byzantine) are detected by the second module. The protocol is resilient to F faulty processes, F⩽min(⌊(n−1)/2⌋,C) (where C is the maximum number of faulty processes that can be tolerated by the underlying certification service).The approach used to design the protocol is new. While usual Byzantine consensus protocols are based on failure detectors to detect processes that stop communicating, none of them use a module to detect their Byzantine behaviour (this detection is not isolated from the protocol and makes it difficult to understand and prove correct). In addition to this modular approach and to a consensus protocol for Byzantine systems, the paper presents a finite state automaton-based implementation of the Byzantine behaviour detection module. Finally, the modular approach followed in this paper can be used to solve other problems in asynchronous systems experiencing Byzantine failures

Elsevier - Publisher Connector

The power and limit of adding synchronization messages for synchronous agreement

Author: Cao J
Raynal M
Wang X
Wu W
Publication venue: IEEE Computer Society
Publication date: 11/12/2014
Field of study

2006-2007 > Academic research: refereed > Refereed conference paperVersion of RecordPublishe

CiteSeerX

PolyU Institutional Repository

Efficient and robust adaptive consensus services based on oracles

Author
Publication venue: Springer
Publication date: 01/10/2004
Field of study

Springer - Publisher Connector