601 research outputs found

    Distributed eventual leader election in the crash-recovery and general omission failure models.

    Get PDF
    102 p.Distributed applications are present in many aspects of everyday life. Banking, healthcare or transportation are examples of such applications. These applications are built on top of distributed systems. Roughly speaking, a distributed system is composed of a set of processes that collaborate among them to achieve a common goal. When building such systems, designers have to cope with several issues, such as different synchrony assumptions and failure occurrence. Distributed systems must ensure that the delivered service is trustworthy.Agreement problems compose a fundamental class of problems in distributed systems. All agreement problems follow the same pattern: all processes must agree on some common decision. Most of the agreement problems can be considered as a particular instance of the Consensus problem. Hence, they can be solved by reduction to consensus. However, a fundamental impossibility result, namely (FLP), states that in an asynchronous distributed system it is impossible to achieve consensus deterministically when at least one process may fail. A way to circumvent this obstacle is by using unreliable failure detectors. A failure detector allows to encapsulate synchrony assumptions of the system, providing (possibly incorrect) information about process failures. A particular failure detector, called Omega, has been shown to be the weakest failure detector for solving consensus with a majority of correct processes. Informally, Omega lies on providing an eventual leader election mechanism

    Agreement in wider environments with weaker assumptions.

    Get PDF
    The set agreement problem states that from n proposed values at most n?1 can be decided. Traditionally, this problem is solved using a failure detector in asynchronous systems where processes may crash but do not recover, where processes have different identities, and where all processes initially know the membership. In this paper we study the set agreement problem and the weakest failure detector L used to solve it in asynchronous message passing systems where processes may crash and recover, with homonyms (i.e., processes may have equal identities) and without a complete initial knowledge of the membership

    Set agreement and the loneliness failure detector in crash-recovery systems

    Get PDF
    The set agreement problem states that from n proposed values at most n-1 can be decided. Traditionally, this problem is solved using a failure detector in asynchronous systems where processes may crash but not recover, where processes have different identities, and where all processes initially know the membership. In this paper we study the set agreement problem and the weakest failure detector L used to solve it in asynchronous message passing systems where processes may crash and recover, with homonyms (i.e., processes may have equal identities) and without a complete initial knowledge of the membership

    Contributions on agreement in dynamic distributed systems

    Get PDF
    139 p.This Ph.D. thesis studies the agreement problem in dynamic distributed systems by integrating both the classical fault-tolerance perspective and the more recent formalism based on evolving graphs. First, we developed a common framework that allows to analyze and compare models of dynamic distributed systems for eventual leader election. The framework extends a previous proposal by Baldoni et al. by including new dimensions and levels of dynamicity. Also, we extend the Time-Varying Graph (TVG) formalism by introducing the necessary timeliness assumptions and the minimal conditions to solve agreement problems. We provide a hierarchy of time-bounded, TVG-based, connectivity classes with increasingly stronger assumptions and specify an implementation of Terminating Reliable Broadcast for each class. Then we define an Omega failure detector, W, for the eventual leader election in dynamic distributed systems, together with a system model, , which is compatible with the timebounded TVG classes. We implement an algorithm that satisfy the properties of W in M. According to our common framework, M results to be weaker than the previous proposed dynamic distributed system models for eventual leader election. Additionally we use simulations to illustrate this fact and show that our leader election algorithm tolerates more general (i.e., dynamic) behaviors, and hence it is of application in a wider range of practical scenarios at the cost of a moderate overhead on stabilization times

    The Alpha of Indulgent Consensus

    Get PDF
    This paper presents a simple framework unifying a family of consensus algorithms that can tolerate process crash failures and asynchronous periods of the network, also called indulgent consensus algorithms. Key to the framework is a new abstraction we introduce here, called Alpha, and which precisely captures consensus safety. Implementations of Alpha in shared memory, storage area network, message passing and active disk systems are presented, leading to directly derived consensus algorithms suited to these communication media. The paper also considers the case where the number of processes is unknown and can be arbitrarily larg

    The Failure Detector Abstraction

    Full text link
    This paper surveys the failure detector concept through two dimensions. First we study failure detectors as building blocks to simplify the design of reliable distributed algorithms. More specifically, we illustrate how failure detectors can factor out timing assumptions to detect failures in distributed agreement algorithms. Second, we study failure detectors as computability benchmarks. That is, we survey the weakest failure detector question and illustrate how failure detectors can be used to classify problems. We also highlights some limitations of the failure detector abstraction along each of the dimensions

    Perspectives on the CAP Theorem

    Get PDF
    Almost twelve years ago, in 2000, Eric Brewer introduced the idea that there is a fundamental trade-off between consistency, availability, and partition tolerance. This trade-off, which has become known as the CAP Theorem, has been widely discussed ever since. In this paper, we review the CAP Theorem and situate it within the broader context of distributed computing theory. We then discuss the practical implications of the CAP Theorem, and explore some general techniques for coping with the inherent trade-offs that it implies

    Communication Predicates: A high-level abstraction for coping with transient and dynamic faults

    Get PDF
    Consensus is one of the key problems in fault tolerant distributed computing. A very popular model for solving consensus is the failure detector model defined by Chandra and Toueg. However, the failure detector model has limitations. The paper points out these limitations, and suggests instead a model based on communication predicates, called HO model. The advantage of the HO model over failure detectors is shown, and the implementation of the HO model is discussed in the context of a system that alternates between good periods and bad periods. Two definitions of a good period are considered. For both definitions, the HO model allows us to compute the duration of a good period for solving consensus. Specifically, the model allows us to quantify the difference between the required length of an initial good period and the length of a non initial good period
    • …
    corecore