4 research outputs found

    Asynchronous Implementation of Failure Detectors with partial connectivity and unknown participants

    Get PDF
    The distributed computing scenario is rapidly evolving for integrating selforganizing and dynamic wireless networks. Unreliable failure detectors are classical mechanisms which provide information about process failures and can help systems to cope with the high dynamism of these networks. A number of failure detection algorithms has been proposed so far. Nonetheless, most of them assume a global knowledge about the membership as well as a fully communication connectivity; additionally, they are timer-based, requiring that eventually some bound on the message transmission will permanently hold. These assumptions are no longer appropriate to the new scenario. This paper presents a new failure detector protocol which implements a new class of detectors, namely S(M), which adapts the properties of the S class to a dynamic network with an unknown membership. It has the interesting feature to be time-free, so that it does not rely on timers to detect failures; moreover, it tolerates mobility of nodes and message losses.L'informatique répartie intègre de plus en plus des réseaux sans fil dynamiques et auto-organisant. Les détecteurs de fautes non fiables sont un mécanisme classique fournissant des informations sur les processus défaillants. Ils peuvent être particulièrement utiles pour gérer le dynamisme important de ces réseaux. De nombreux algorithmes de détection de fautes ont déjà été proposés. Cependant, la plupart d'entre eux considèrent un ensemble connu de processus interconnectés par un réseau complètement maillé. De plus, ces détecteurs reposent sur des temporisateurs et supposent à terme des bornes sur les délais de transmission des messages. Des telles hypothèses ne sont pas réalistes dans les environnements dynamiques. Cet article présente un nouveau protocole pour détecter les fautes qui implémente une nouvelle classe de détecteurs, appelé S(M), qui adapte les propriétés de la classe S aux réseaux dynamiques avec l'absence de la connaissance des participants. Notre détecteur ne repose sur aucun temporisateur ; de plus, il tolère la mobilité des noeuds et la perte de messages

    Fault-Tolerant Distributed Services in Message-Passing Systems

    Get PDF
    Distributed systems ranging from small local area networks to large wide area networks like the Internet composed of static and/or mobile users have become increasingly popular. A desirable property for any distributed service is fault-tolerance, which means the service remains uninterrupted even if some components in the network fail. This dissertation considers weak distributed models to find either algorithms to solve certain problems or impossibility proofs to show that a problem is unsolvable. These are the main contributions of this dissertation: • Failure detectors are used as a service to solve consensus (agreement among nodes) which is otherwise impossible in failure-prone asynchronous systems. We find an algorithm for crash-failure detection that uses bounded size messages in an arbitrary, partitionable network composed of badly- behaved channels that can lose and reorder messages. • Registers are a fundamental building block for shared memory emulations on top of message passing systems. The problem has been extensively studied in static systems. However, register emulation in dynamic systems with faulty nodes is still quite hard and there are impossibility proofs that point out scenarios where change in the system composition due to nodes entering and leaving (also called churn) makes the problem unsolvable. We propose the first emulation of a crash-fault tolerant register in a system with continuous churn where consensus is unsolvable, the size of the system can grow without bound and at most a constant fraction of the number of nodes in the system can fail by crashing. We prove a lower bound that states that fault-tolerance for dynamic systems with churn is inherently lower than in static systems. • We then extend the results in the crash-fault tolerant case to a dynamic system with continuous churn and nodes that can be Byzantine faulty. It is the first emulation of an atomic register in a system that can withstand nodes continually entering and leaving, imposes no upper bound on the system size and can tolerate Byzantine nodes. However, the number of Byzantine faulty nodes that can be tolerated is upper bounded by a constant number. Although the algorithm requires that there be a constant known upper bound on the number of Byzantine nodes, this restriction is unavoidable, as we show that it is impossible to emulate an atomic register if the system size and maximum number of servers that can be Byzantine in the system is unknown

    An Efficient Failure Detector for Sparsely Connected Networks

    No full text
    We present an implementation of an eventually perfect failure detector for sparsely connected, partitionable networks, where each process has only a bounded number of neighbors. Processes and links may fail by crashing. Regarding synchrony, our algorithm only needs to know an upper bound on the jitter ε of the communication between direct neighbors. No a-priori knowledge about the number of processes in the system is required. The algorithm uses heartbeats to determine whether a process is in the same partition. By reducing the frequency of forwards by distance, information about nearer processes is more accurate than about farther ones, and the message size becomes constant. Since this property can be guaranteed independently of the number of processes in the system, our failure detector is very efficient in terms of communication complexity
    corecore