18 research outputs found
Mutual Exclusion in Asynchronous Systems with Failure Detectors
This paper defines the fault-tolerant mutual exclusion problem in a message-passing asynchronous system and determines the weakest failure detector to solve the problem. This failure detector, which we call the trusting failure detector, and which we denote by T, is strictly weaker than the perfect failure detector P but strictly stronger than the eventually perfect failure detector P. The paper shows that a majority of correct processes is necessary to solve the problem with T. Moreover, T is also the weakest failure detector to solve the fault-tolerant group mutual exclusion problem
Algorithms For Extracting Timeliness Graphs
We consider asynchronous message-passing systems in which some links are
timely and processes may crash. Each run defines a timeliness graph among
correct processes: (p; q) is an edge of the timeliness graph if the link from p
to q is timely (that is, there is bound on communication delays from p to q).
The main goal of this paper is to approximate this timeliness graph by graphs
having some properties (such as being trees, rings, ...). Given a family S of
graphs, for runs such that the timeliness graph contains at least one graph in
S then using an extraction algorithm, each correct process has to converge to
the same graph in S that is, in a precise sense, an approximation of the
timeliness graph of the run. For example, if the timeliness graph contains a
ring, then using an extraction algorithm, all correct processes eventually
converge to the same ring and in this ring all nodes will be correct processes
and all links will be timely. We first present a general extraction algorithm
and then a more specific extraction algorithm that is communication efficient
(i.e., eventually all the messages of the extraction algorithm use only links
of the extracted graph)
The weakest failure detector for wait-free dining under eventual weak exclusion
Dining philosophers is a classic scheduling problem for local mutual exclusion on arbitrary conflict graphs. We establish necessary conditions to solve wait-free dining under eventual weak exclusion in message-passing systems with crash faults. Wait-free dining ensures that every correct hungry process eventually eats. Eventual weak exclusion permits finitely many scheduling mistakes, but eventually no live neighbors eat simultaneously; this exclusion criterion models scenarios where scheduling mistakes are recoverable or only affect per-formance. Previous work showed that the eventually perfect failure detector (3P) is sufficient to solve wait-free dining under eventual weak exclusion; we prove that 3P is also necessary, and thus 3P is the weakest oracle to solve this problem. Our reduction also establishes that any such din-ing solution can be made eventually fair. Finally, the reduc-tion itself may be of more general interest; when applied to wait-free perpetual weak exclusion, our reduction produces an alternative proof that the more powerful trusting oracle (T) is necessary (but not sufficient) to solve the problem o
Section critique à entrées multiples tolérante aux fautes et utilisant des détecteurs de défaillances
Nous présentons dans cet article un nouvel algorithme tolérant aux fautes de K-exclusion mutuelle. Cet algorithme à permission est une extension de l'algorithme de Raymond [Ray89]. Il tolère n − 1 fautes et reste efficace malgré les défaillances. L'algorithme repose sur un détecteur de fautes non fiable. Une évaluation de performances montre l'efficacité de notre approche en présence de fautes
The Weakest Failure Detector to Solve Mutual Exclusion
Mutual exclusion is not solvable in an asynchronous message-passing system where processes are subject to crash failures. Delporte-Gallet et. al. determined the weakest failure detector to solve this problem when a majority of processes are correct. Here we identify the weakest failure detector to solve mutual exclusion in any environment, i.e., regardless of the number of faulty processes. We also show a relation between mutual exclusion and consensus, arguably the two most fundamental problems in distributed computing. Specifically, we show that a failure detector that solves mutual exclusion is sufficient to solve non-uniform consensus but not necessarily uniform consensus
The weakest failure detector for wait-free dining under eventual weak exclusion
ABSTRACT Dining philosophers is a classic scheduling problem for local mutual exclusion on arbitrary conflict graphs. We establish necessary conditions to solve wait-free dining under eventual weak exclusion in message-passing systems with crash faults. Wait-free dining ensures that every correct hungry process eventually eats. Eventual weak exclusion permits finitely many scheduling mistakes, but eventually no live neighbors eat simultaneously; this exclusion criterion models scenarios where scheduling mistakes are recoverable or only affect performance. Previous work showed that the eventually perfect failure detector (3P) is sufficient to solve wait-free dining under eventual weak exclusion; we prove that 3P is also necessary, and thus 3P is the weakest oracle to solve this problem. Our reduction also establishes that any such dining solution can be made eventually fair. Finally, the reduction itself may be of more general interest; when applied to wait-free perpetual weak exclusion, our reduction produces an alternative proof that the more powerful trusting oracle (T ) is necessary (but not sufficient) to solve the problem of Fault-Tolerant Mutual Exclusion (FTME)
Impact FD: An Unreliable Failure Detector Based on Process Relevance and Confidence in the System
International audienceThis paper presents a new unreliable failure detector, called the Impact failure detector (FD) that, contrarily to the majority of traditional FDs, outputs a trust level value which expresses the degree of confidence in the system. An impact factor is assigned to each process and the trust level is equal to the sum of the impact factors of the processes not suspected of failure. Moreover, a threshold parameter defines a lower bound value for the trust level, over which the confidence in the system is ensured. In particular, we defined a f l exi bi l i t y property that denotes the capacity of the Impact FD to tolerate a certain margin of failures or false suspicions, i.e., its capacity of considering different sets of responses that lead the system to trusted states. The Impact FD is suitable for systems that present node redundancy, heterogeneity of nodes, clustering feature, and allow a margin of failures which does not degrade the confidence in the system. The paper also includes a timer-based distributed algorithm which implements an Impact FD, as well as its proof of correctness, for systems whose links are lossy asynchronous or for those whose all (or some) links are eventually timely. Performance evaluation results, based on PlanetLab [1] traces, confirm the degree of flexible applicability of our failure detector and that, due to the accepted margin of failure, both failures and false suspicions are more tolerated when compared to traditional unreliable failure detectors
The Failure Detector Abstraction
A failure detector is a fundamental abstraction in distributed computing. This paper surveys this abstraction through two dimensions. First we study failure detectors as building blocks to simplify the design of reliable distributed algorithms. In particular, we illustrate how failure detectors can factor out timing assumptions to detect failures in distributed agreement algorithms. Second, we study failure detectors as computability benchmarks. That is, we survey the weakest failure detector question and illustrate how failure detectors can be used to classify problems. We also highlight some limitations of the failure detector abstraction along each of the dimensions