189 research outputs found

    Wait-Freedom with Advice

    Full text link
    We motivate and propose a new way of thinking about failure detectors which allows us to define, quite surprisingly, what it means to solve a distributed task \emph{wait-free} \emph{using a failure detector}. In our model, the system is composed of \emph{computation} processes that obtain inputs and are supposed to output in a finite number of steps and \emph{synchronization} processes that are subject to failures and can query a failure detector. We assume that, under the condition that \emph{correct} synchronization processes take sufficiently many steps, they provide the computation processes with enough \emph{advice} to solve the given task wait-free: every computation process outputs in a finite number of its own steps, regardless of the behavior of other computation processes. Every task can thus be characterized by the \emph{weakest} failure detector that allows for solving it, and we show that every such failure detector captures a form of set agreement. We then obtain a complete classification of tasks, including ones that evaded comprehensible characterization so far, such as renaming or weak symmetry breaking

    Fairness Properties of the Trusting Failure Detector

    Get PDF
    In 1985 it was shown by Fischer et al. that consensus, a fundamental problem in distributed computing, was impossible in asynchronous distributed systems in the presence of even just one process failure. This result prompted a search for alternative system models that were capable of solving such problems and culminated in the development of two helpful constructs: partially synchronous system models and failure detectors. Partially synchronous system models seek to solve the problem of identifying process crashes by constraining the real-time behavior of the underlying system. In the resulting models, crashed processes can be detected indirectly through the use of timeouts. Failure detectors, on the other hand, address process crashes by directly providing (potentially inaccurate) information on failures. As a result, failure detectors were viewed as abstractions of real-time information. Pike et al. proposed a different perspective on failure detectors; as abstracting fairness properties. Fairness in a system imposes bounds on the relative frequencies of communication and execution between processes in a system, and it was shown that four frequently-used failure detectors from the Chandra-Toueg hierarchy (P, ♱P, S, ♱S) encapsulate these fairness properties. This discovery suggests that failure detectors may be better understood as abstractions of fairness rather than real-time properties as well as demonstrates the possibility to communicate results between systems augmented with failure detectors and partially synchronous system models. In this thesis, we will be discussing an extension of the Pike et al. result to the trusting failure detector. The trusting failure detector is the weakest failure detector to implement the problem of fault-tolerant mutual exclusion: a fundamental primitive for distributed computing

    A Prescription for Partial Synchrony

    Get PDF
    Algorithms in message-passing distributed systems often require partial synchrony to tolerate crash failures. Informally, partial synchrony refers to systems where timing bounds on communication and computation may exist, but the knowledge of such bounds is limited. Traditionally, the foundation for the theory of partial synchrony has been real time: a time base measured by counting events external to the system, like the vibrations of Cesium atoms or piezoelectric crystals. Unfortunately, algorithms that are correct relative to many real-time based models of partial synchrony may not behave correctly in empirical distributed systems. For example, a set of popular theoretical models, which we call M_*, assume (eventual) upper bounds on message delay and relative process speeds, regardless of message size and absolute process speeds. Empirical systems with bounded channel capacity and bandwidth cannot realize such assumptions either natively, or through algorithmic constructions. Consequently, empirical deployment of the many M_*-based algorithms risks anomalous behavior. As a result, we argue that real time is the wrong basis for such a theory. Instead, the appropriate foundation for partial synchrony is fairness: a time base measured by counting events internal to the system, like the steps executed by the processes. By way of example, we redefine M_* models with fairness-based bounds and provide algorithmic techniques to implement fairness-based M_* models on a significant subset of the empirical systems. The proposed techniques use failure detectors — system services that provide hints about process crashes — as intermediaries that preserve the fairness constraints native to empirical systems. In effect, algorithms that are correct in M_* models are now proved correct in such empirical systems as well. Demonstrating our results requires solving three open problems. (1) We propose the first unified mathematical framework based on Timed I/O Automata to specify empirical systems, partially synchronous systems, and algorithms that execute within the aforementioned systems. (2) We show that crash tolerance capabilities of popular distributed systems can be denominated exclusively through fairness constraints. (3) We specify exemplar system models that identify the set of weakest system models to implement popular failure detectors

    Consensus is Easier Than Reliable Broadcast

    Get PDF
    RapportWe consider asynchronous distributed systems with message losses and process crashes. We study the impact of ïŹnite process memory on the solution to consensus, repeated consensus and reliable broadcast. With ïŹnite process memory, we show that in some sense consensus is easier to solve than reliable broadcast, and that reliable broadcast is as diïŹƒcult to solve as repeated consensus: More precisely, with ïŹnite memory, consensus can be solved with failure detector S , and P − (a variant of the perfect failure detector which is stronger than S ) is necessary and suïŹƒcient to solve reliable broadcast and repeated consensus

    A Look at Basics of Distributed Computing *

    Get PDF
    International audienceThis paper presents concepts and basics of distributed computing which are important (at least from the author's point of view), and should be known and mastered by Master students and engineers. Those include: (a) a characterization of distributed computing (which is too much often confused with parallel computing); (b) the notion of a synchronous system and its associated notions of a local algorithm and message adversaries; (c) the notion of an asynchronous shared memory system and its associated notions of universality and progress conditions; and (d) the notion of an asynchronous message-passing system with its associated broadcast and agreement abstractions, its impossibility results, and approaches to circumvent them. Hence, the paper can be seen as a guided tour to key elements that constitute basics of distributed computing

    The weakest failure detectors to boost obstruction-freedom

    Get PDF
    It is considered good practice in concurrent computing to devise shared object implementations that ensure a minimal obstruction-free progress property and delegate the task of boosting liveness to independent generic oracles called contention managers. This paper determines necessary and sufficient conditions to implement wait-free and non-blocking contention managers, i.e., contention managers that ensure wait-freedom (resp. non-blockingness) of any associated obstruction-free object implementation. The necessary conditions hold even when universal objects (like compare-and-swap) or random oracles are available in the implementation of the contention manager. On the other hand, the sufficient conditions assume only basic read/write objects, i.e., registers. We show that failure detector \lozenge{\fancyscript{P}} is the weakest to convert any obstruction-free algorithm into a wait-free one, and Ω *, a new failure detector which we introduce in this paper, and which is strictly weaker than \lozenge\fancyscript{P} but strictly stronger than Ω, is the weakest to convert any obstruction-free algorithm into a non-blocking one. We also address the issue of minimizing the overhead imposed by contention management in low contention scenarios. We propose two intermittent failure detectors IΩ∗I_{\Omega^*} and I_{\lozenge\fancyscript{P}} that are in a precise sense equivalent to, respectively, Ω * and \lozenge\fancyscript{P} , but allow for reducing the cost of failure detection in eventually synchronous systems when there is little contention. We present two contention managers: a non-blocking one and a wait-free one, that use, respectively, IΩ∗I_{\Omega^*} and I_{\lozenge\fancyscript{P}} . When there is no contention, the first induces very little overhead whereas the second induces some non-trivial overhead. We show that wait-free contention managers, unlike their non-blocking counterparts, impose an inherent non-trivial overhead even in contention-free execution

    The weakest failure detector for wait-free dining under eventual weak exclusion

    Full text link
    Dining philosophers is a classic scheduling problem for local mutual exclusion on arbitrary conflict graphs. We establish necessary conditions to solve wait-free dining under eventual weak exclusion in message-passing systems with crash faults. Wait-free dining ensures that every correct hungry process eventually eats. Eventual weak exclusion permits finitely many scheduling mistakes, but eventually no live neighbors eat simultaneously; this exclusion criterion models scenarios where scheduling mistakes are recoverable or only affect per-formance. Previous work showed that the eventually perfect failure detector (3P) is sufficient to solve wait-free dining under eventual weak exclusion; we prove that 3P is also necessary, and thus 3P is the weakest oracle to solve this problem. Our reduction also establishes that any such din-ing solution can be made eventually fair. Finally, the reduc-tion itself may be of more general interest; when applied to wait-free perpetual weak exclusion, our reduction produces an alternative proof that the more powerful trusting oracle (T) is necessary (but not sufficient) to solve the problem o

    A Prescription for Partial Synchrony

    Get PDF
    Algorithms in message-passing distributed systems often require partial synchrony to tolerate crash failures. Informally, partial synchrony refers to systems where timing bounds on communication and computation may exist, but the knowledge of such bounds is limited. Traditionally, the foundation for the theory of partial synchrony has been real time: a time base measured by counting events external to the system, like the vibrations of Cesium atoms or piezoelectric crystals. Unfortunately, algorithms that are correct relative to many real-time based models of partial synchrony may not behave correctly in empirical distributed systems. For example, a set of popular theoretical models, which we call M_*, assume (eventual) upper bounds on message delay and relative process speeds, regardless of message size and absolute process speeds. Empirical systems with bounded channel capacity and bandwidth cannot realize such assumptions either natively, or through algorithmic constructions. Consequently, empirical deployment of the many M_*-based algorithms risks anomalous behavior. As a result, we argue that real time is the wrong basis for such a theory. Instead, the appropriate foundation for partial synchrony is fairness: a time base measured by counting events internal to the system, like the steps executed by the processes. By way of example, we redefine M_* models with fairness-based bounds and provide algorithmic techniques to implement fairness-based M_* models on a significant subset of the empirical systems. The proposed techniques use failure detectors — system services that provide hints about process crashes — as intermediaries that preserve the fairness constraints native to empirical systems. In effect, algorithms that are correct in M_* models are now proved correct in such empirical systems as well. Demonstrating our results requires solving three open problems. (1) We propose the first unified mathematical framework based on Timed I/O Automata to specify empirical systems, partially synchronous systems, and algorithms that execute within the aforementioned systems. (2) We show that crash tolerance capabilities of popular distributed systems can be denominated exclusively through fairness constraints. (3) We specify exemplar system models that identify the set of weakest system models to implement popular failure detectors
    • 

    corecore