1,877 research outputs found

    Distributed eventual leader election in the crash-recovery and general omission failure models.

    Get PDF
    102 p.Distributed applications are present in many aspects of everyday life. Banking, healthcare or transportation are examples of such applications. These applications are built on top of distributed systems. Roughly speaking, a distributed system is composed of a set of processes that collaborate among them to achieve a common goal. When building such systems, designers have to cope with several issues, such as different synchrony assumptions and failure occurrence. Distributed systems must ensure that the delivered service is trustworthy.Agreement problems compose a fundamental class of problems in distributed systems. All agreement problems follow the same pattern: all processes must agree on some common decision. Most of the agreement problems can be considered as a particular instance of the Consensus problem. Hence, they can be solved by reduction to consensus. However, a fundamental impossibility result, namely (FLP), states that in an asynchronous distributed system it is impossible to achieve consensus deterministically when at least one process may fail. A way to circumvent this obstacle is by using unreliable failure detectors. A failure detector allows to encapsulate synchrony assumptions of the system, providing (possibly incorrect) information about process failures. A particular failure detector, called Omega, has been shown to be the weakest failure detector for solving consensus with a majority of correct processes. Informally, Omega lies on providing an eventual leader election mechanism

    A Timing Assumption and two tt-Resilient Protocols for Implementing an Eventual Leader Service in Asynchronous Shared Memory Systems

    Get PDF
    This paper considers the problem of electing an eventual leader in an asynchronous shared memory system. While this problem has received a lot of attention in message-passing systems, very few solutions have been proposed for shared memory systems. As an eventual leader cannot be elected in a pure asynchronous system prone to process crashes, the paper first proposes to enrich the asynchronous system model with an additional assumption. That assumption (denoted AWB\mathit{AWB}) is particularly weak. It is made up of two complementary parts. More precisely, it requires that, after some time, (1) there is a process whose write accesses to some shared variables be timely, and (2) the timers of (t−f)(t-f) other processes be asymptotically well-behaved (tt denotes the maximal number of processes that may crash, and ff the actual number of process crashes in a run). The {\it asymptotically well-behaved} timer notion is a new notion that generalizes and weakens the traditional notion of timers whose durations are required to monotonically increase when the values they are set to increase (a timer works incorrectly when it expires at arbitrary times, i.e., independently of the value it has been set to). The paper then focuses on the design of tt-resilient AWB\mathit{AWB}-based eventual leader protocols. ``tt-resilient'' means that each protocol can cope with up to tt process crashes (taking t=n−1t=n-1 provides wait-free protocols, i.e., protocols that can cope with any number of process failures). Two protocols are presented. The first enjoys the following noteworthy properties: after some time only the elected leader has to write the shared memory, and all but one shared variables have a bounded domain, be the execution finite or infinite. This protocol is consequently optimal with respect to the number of processes that have to write the shared memory. The second protocol guarantees that all the shared variables have a bounded domain. This is obtained at the following additional price: all the processes are required to forever write the shared memory. A theorem is proved which states that this price has to be paid by any protocol that elects an eventual leader in a bounded shared memory model. This second protocol is consequently optimal with respect to the number of processes that have to write in such a constrained memory model. In a very interesting way, these protocols show an inherent tradeoff relating the number of processes that have to write the shared memory and the bounded/unbounded attribute of that memory

    Implementing the weakest failure detector for solving consensus

    Get PDF
    The concept of unreliable failure detector was introduced by Chandra and Toueg as a mechanism that provides information about process failures. This mechanism has been used to solve several agreement problems, such as the consensus problem. In this paper, algorithms that implement failure detectors in partially synchronous systems are presented. First two simple algorithms of the weakest class to solve the consensus problem, namely the Eventually Strong class (⋄S), are presented. While the first algorithm is wait-free, the second algorithm is f-resilient, where f is a known upper bound on the number of faulty processes. Both algorithms guarantee that, eventually, all the correct processes agree permanently on a common correct process, i.e. they also implement a failure detector of the class Omega (Ω). They are also shown to be optimal in terms of the number of communication links used forever. Additionally, a wait-free algorithm that implements a failure detector of the Eventually Perfect class (⋄P) is presented. This algorithm is shown to be optimal in terms of the number of bidirectional links used forever

    Contributions on agreement in dynamic distributed systems

    Get PDF
    139 p.This Ph.D. thesis studies the agreement problem in dynamic distributed systems by integrating both the classical fault-tolerance perspective and the more recent formalism based on evolving graphs. First, we developed a common framework that allows to analyze and compare models of dynamic distributed systems for eventual leader election. The framework extends a previous proposal by Baldoni et al. by including new dimensions and levels of dynamicity. Also, we extend the Time-Varying Graph (TVG) formalism by introducing the necessary timeliness assumptions and the minimal conditions to solve agreement problems. We provide a hierarchy of time-bounded, TVG-based, connectivity classes with increasingly stronger assumptions and specify an implementation of Terminating Reliable Broadcast for each class. Then we define an Omega failure detector, W, for the eventual leader election in dynamic distributed systems, together with a system model, , which is compatible with the timebounded TVG classes. We implement an algorithm that satisfy the properties of W in M. According to our common framework, M results to be weaker than the previous proposed dynamic distributed system models for eventual leader election. Additionally we use simulations to illustrate this fact and show that our leader election algorithm tolerates more general (i.e., dynamic) behaviors, and hence it is of application in a wider range of practical scenarios at the cost of a moderate overhead on stabilization times

    The Alpha of Indulgent Consensus

    Get PDF
    This paper presents a simple framework unifying a family of consensus algorithms that can tolerate process crash failures and asynchronous periods of the network, also called indulgent consensus algorithms. Key to the framework is a new abstraction we introduce here, called Alpha, and which precisely captures consensus safety. Implementations of Alpha in shared memory, storage area network, message passing and active disk systems are presented, leading to directly derived consensus algorithms suited to these communication media. The paper also considers the case where the number of processes is unknown and can be arbitrarily larg

    Asynchronous Implementation of Failure Detectors with partial connectivity and unknown participants

    Get PDF
    The distributed computing scenario is rapidly evolving for integrating selforganizing and dynamic wireless networks. Unreliable failure detectors are classical mechanisms which provide information about process failures and can help systems to cope with the high dynamism of these networks. A number of failure detection algorithms has been proposed so far. Nonetheless, most of them assume a global knowledge about the membership as well as a fully communication connectivity; additionally, they are timer-based, requiring that eventually some bound on the message transmission will permanently hold. These assumptions are no longer appropriate to the new scenario. This paper presents a new failure detector protocol which implements a new class of detectors, namely S(M), which adapts the properties of the S class to a dynamic network with an unknown membership. It has the interesting feature to be time-free, so that it does not rely on timers to detect failures; moreover, it tolerates mobility of nodes and message losses.L'informatique répartie intègre de plus en plus des réseaux sans fil dynamiques et auto-organisant. Les détecteurs de fautes non fiables sont un mécanisme classique fournissant des informations sur les processus défaillants. Ils peuvent être particulièrement utiles pour gérer le dynamisme important de ces réseaux. De nombreux algorithmes de détection de fautes ont déjà été proposés. Cependant, la plupart d'entre eux considèrent un ensemble connu de processus interconnectés par un réseau complètement maillé. De plus, ces détecteurs reposent sur des temporisateurs et supposent à terme des bornes sur les délais de transmission des messages. Des telles hypothèses ne sont pas réalistes dans les environnements dynamiques. Cet article présente un nouveau protocole pour détecter les fautes qui implémente une nouvelle classe de détecteurs, appelé S(M), qui adapte les propriétés de la classe S aux réseaux dynamiques avec l'absence de la connaissance des participants. Notre détecteur ne repose sur aucun temporisateur ; de plus, il tolère la mobilité des noeuds et la perte de messages

    Time-Free Authenticated Byzantine Consensus

    Get PDF
    International audienceThis paper presents a simple protocol that solves the authenticated ByzantineConsensus problem in asynchronous distributed systems. To circumvent the FLP impossibility result in a deterministic way, synchrony assumptions should be added. In the context of Byzantine failures for systems where at most t processes may exhibit a Byzantine behavior and where not all the system is assumed eventually synchronous, Moumen et al. provide the main result. They assume at least one correct process, called 2t-bisource, connected with 2t privileged neighbors with eventually timely outgoing and incoming links. The present paper shows that a deterministic solution for the authenticated byzantine consensus problem is possible if the system model satisfies an additional assumption that does not rely on physical time but on the pattern of messages that are exchanged. The basic message exchange between processes is the query-response mechanism. To solve the Consensus problem, we assume a correct process p, called eventual 2t-winning process, and a set Q of 2t processes such that, eventually, for each query issued by p, any process q of Q receives a response from p among the (n-t) first responses to that query. The processes in the set Q can exhibit a Byzantine behavior and this set may change over time. Whereas many time-free solutions have been designed for the consensus problem in the crash model, this is, to our knowledge, the first time-free deterministic solution to the Byzantine consensus problem.Cet article présente un protocole de consensus dans un système asynchrone où un certain nombre de processus (au plus t) peuvent exhiber un comportement byzantin. Il est connu que ce problème n'a pas de solution dans un tel contexte c'est pour cela qu'il faut renforcer le système asynchrone de base avec des contraintes temporelles ou structurelles qui puisse rendre le problème décidable. La démarche choisie dans cet article puisque c'est la première fois que la contrainte rajoutée n'est pas temorelles mais repose sur la pattern de message échangé par les processus

    A Prescription for Partial Synchrony

    Get PDF
    Algorithms in message-passing distributed systems often require partial synchrony to tolerate crash failures. Informally, partial synchrony refers to systems where timing bounds on communication and computation may exist, but the knowledge of such bounds is limited. Traditionally, the foundation for the theory of partial synchrony has been real time: a time base measured by counting events external to the system, like the vibrations of Cesium atoms or piezoelectric crystals. Unfortunately, algorithms that are correct relative to many real-time based models of partial synchrony may not behave correctly in empirical distributed systems. For example, a set of popular theoretical models, which we call M_*, assume (eventual) upper bounds on message delay and relative process speeds, regardless of message size and absolute process speeds. Empirical systems with bounded channel capacity and bandwidth cannot realize such assumptions either natively, or through algorithmic constructions. Consequently, empirical deployment of the many M_*-based algorithms risks anomalous behavior. As a result, we argue that real time is the wrong basis for such a theory. Instead, the appropriate foundation for partial synchrony is fairness: a time base measured by counting events internal to the system, like the steps executed by the processes. By way of example, we redefine M_* models with fairness-based bounds and provide algorithmic techniques to implement fairness-based M_* models on a significant subset of the empirical systems. The proposed techniques use failure detectors — system services that provide hints about process crashes — as intermediaries that preserve the fairness constraints native to empirical systems. In effect, algorithms that are correct in M_* models are now proved correct in such empirical systems as well. Demonstrating our results requires solving three open problems. (1) We propose the first unified mathematical framework based on Timed I/O Automata to specify empirical systems, partially synchronous systems, and algorithms that execute within the aforementioned systems. (2) We show that crash tolerance capabilities of popular distributed systems can be denominated exclusively through fairness constraints. (3) We specify exemplar system models that identify the set of weakest system models to implement popular failure detectors
    • …
    corecore