111 research outputs found

    The weakest failure detector for wait-free dining under eventual weak exclusion

    Full text link
    Dining philosophers is a classic scheduling problem for local mutual exclusion on arbitrary conflict graphs. We establish necessary conditions to solve wait-free dining under eventual weak exclusion in message-passing systems with crash faults. Wait-free dining ensures that every correct hungry process eventually eats. Eventual weak exclusion permits finitely many scheduling mistakes, but eventually no live neighbors eat simultaneously; this exclusion criterion models scenarios where scheduling mistakes are recoverable or only affect per-formance. Previous work showed that the eventually perfect failure detector (3P) is sufficient to solve wait-free dining under eventual weak exclusion; we prove that 3P is also necessary, and thus 3P is the weakest oracle to solve this problem. Our reduction also establishes that any such din-ing solution can be made eventually fair. Finally, the reduc-tion itself may be of more general interest; when applied to wait-free perpetual weak exclusion, our reduction produces an alternative proof that the more powerful trusting oracle (T) is necessary (but not sufficient) to solve the problem o

    The weakest failure detector for wait-free dining under eventual weak exclusion

    Get PDF
    ABSTRACT Dining philosophers is a classic scheduling problem for local mutual exclusion on arbitrary conflict graphs. We establish necessary conditions to solve wait-free dining under eventual weak exclusion in message-passing systems with crash faults. Wait-free dining ensures that every correct hungry process eventually eats. Eventual weak exclusion permits finitely many scheduling mistakes, but eventually no live neighbors eat simultaneously; this exclusion criterion models scenarios where scheduling mistakes are recoverable or only affect performance. Previous work showed that the eventually perfect failure detector (3P) is sufficient to solve wait-free dining under eventual weak exclusion; we prove that 3P is also necessary, and thus 3P is the weakest oracle to solve this problem. Our reduction also establishes that any such dining solution can be made eventually fair. Finally, the reduction itself may be of more general interest; when applied to wait-free perpetual weak exclusion, our reduction produces an alternative proof that the more powerful trusting oracle (T ) is necessary (but not sufficient) to solve the problem of Fault-Tolerant Mutual Exclusion (FTME)

    The Weakest Failure Detector for Solving Wait-Free, Eventually Bounded-Fair Dining Philosophers

    Get PDF
    This dissertation explores the necessary and sufficient conditions to solve a variant of the dining philosophers problem. This dining variant is defined by three properties: wait-freedom, eventual weak exclusion, and eventual bounded fairness. Wait-freedom guarantees that every correct hungry process eventually enters its critical section, regardless of process crashes. Eventual weak exclusion guarantees that every execution has an infinite suffix during which no two live neighbors execute overlapping critical sections. Eventual bounded fairness guarantees that there exists a fairness bound k such that every execution has an infinite suffix during which no correct hungry process is overtaken more than k times by any neighbor. This dining variant (WF-EBF dining for short) is important for synchronization tasks where eventual safety (i.e., eventual weak exclusion) is sufficient for correctness (e.g., duty-cycle scheduling, self-stabilizing daemons, and contention managers). Unfortunately, it is known that wait-free dining is unsolvable in asynchronous message-passing systems subject to crash faults. To circumvent this impossibility result, it is necessary to assume the existence of bounds on timing properties, such as relative process speeds and message delivery time. As such, it is of interest to characterize the necessary and sufficient timing assumptions to solve WF-EBF dining. We focus on implicit timing assumptions, which can be encapsulated by failure detectors. Failure detectors can be viewed as distributed oracles that can be queried for potentially unreliable information about crash faults. The weakest detector D for WF-EBF dining means that D is both necessary and sufficient. Necessity means that every failure detector that solves WF-EBF dining is at least as strong as D. Sufficiency means that there exists at least one algorithm that solves WF-EBF dining using D. As such, our research goal is to characterize the weakest failure detector to solve WF-EBF dining. We prove that the eventually perfect failure detector 3P is the weakest failure detector for solving WF-EBF dining. 3P eventually suspects crashed processes permanently, but may make mistakes by wrongfully suspecting correct processes finitely many times during any execution. As such, 3P eventually stops suspecting correct processes

    A Prescription for Partial Synchrony

    Get PDF
    Algorithms in message-passing distributed systems often require partial synchrony to tolerate crash failures. Informally, partial synchrony refers to systems where timing bounds on communication and computation may exist, but the knowledge of such bounds is limited. Traditionally, the foundation for the theory of partial synchrony has been real time: a time base measured by counting events external to the system, like the vibrations of Cesium atoms or piezoelectric crystals. Unfortunately, algorithms that are correct relative to many real-time based models of partial synchrony may not behave correctly in empirical distributed systems. For example, a set of popular theoretical models, which we call M_*, assume (eventual) upper bounds on message delay and relative process speeds, regardless of message size and absolute process speeds. Empirical systems with bounded channel capacity and bandwidth cannot realize such assumptions either natively, or through algorithmic constructions. Consequently, empirical deployment of the many M_*-based algorithms risks anomalous behavior. As a result, we argue that real time is the wrong basis for such a theory. Instead, the appropriate foundation for partial synchrony is fairness: a time base measured by counting events internal to the system, like the steps executed by the processes. By way of example, we redefine M_* models with fairness-based bounds and provide algorithmic techniques to implement fairness-based M_* models on a significant subset of the empirical systems. The proposed techniques use failure detectors — system services that provide hints about process crashes — as intermediaries that preserve the fairness constraints native to empirical systems. In effect, algorithms that are correct in M_* models are now proved correct in such empirical systems as well. Demonstrating our results requires solving three open problems. (1) We propose the first unified mathematical framework based on Timed I/O Automata to specify empirical systems, partially synchronous systems, and algorithms that execute within the aforementioned systems. (2) We show that crash tolerance capabilities of popular distributed systems can be denominated exclusively through fairness constraints. (3) We specify exemplar system models that identify the set of weakest system models to implement popular failure detectors

    The Weakest Failure Detector to Solve Mutual Exclusion

    Get PDF
    Mutual exclusion is not solvable in an asynchronous message-passing system where processes are subject to crash failures. Delporte-Gallet et. al. determined the weakest failure detector to solve this problem when a majority of processes are correct. Here we identify the weakest failure detector to solve mutual exclusion in any environment, i.e., regardless of the number of faulty processes. We also show a relation between mutual exclusion and consensus, arguably the two most fundamental problems in distributed computing. Specifically, we show that a failure detector that solves mutual exclusion is sufficient to solve non-uniform consensus but not necessarily uniform consensus

    Failure detectors encapsulate fairness

    Get PDF
    Failure detectors have long been viewed as abstractions for the synchronism present in distributed system models. However, investigations into the exact amount of synchronism encapsulated by a given failure detector have met with limited success. The reason for this is that traditionally, models of partial synchrony are specified with respect to real time, but failure detectors do not encapsulate real time. Instead, we argue that failure detectors encapsulate the fairness in computation and communication. Fairness is a measure of the number of steps executed by one process relative either to the number of steps taken by another process or relative to the duration for which a message is in transit. We argue that failure detectors are substitutable for the fairness properties (rather than real-time properties) of partially synchronous systems. We propose four fairness-based models of partial synchrony and demonstrate that they are, in fact, the ‘weakest system models’ to implement the canonical failure detectors from the Chandra-Toueg hierarchy. We also propose a set of fairness-based models which encapsulate the G[subscript c] parametric failure detectors which eventually and permanently suspect crashed processes, and eventually and permanently trust some fixed set of c correct processes.National Science Foundation (U.S.) (Grant CCF-0964696)National Science Foundation (U.S.) (Grant CCF-0937274)Texas Higher Education Coordinating Board (grant NHARP 000512-0130-2007)National Science Foundation (U.S.) (NSF Science and Technology Center, grant agreement CCF-0939370

    Dining philosophers with masking tolerance to crash faults

    Get PDF
    We examine the tolerance of dining philosopher algorithms subject to process crash faults in arbitrary conflict graphs. This classic problem is unsolvable in asynchronous message-passing systems subject to even a single crash fault. By contrast, dining can be solved in synchronous systems capable of implementing the perfect failure detector P (from the Chandra-Toueg hierarchy). We show that dining is also solvable in weaker timing models using a combination of the trusting detector T and the strong detector S; Our approach extends and composes two currents of previous research. First, we define a parametric generalization of Lynch’s classic algorithm for hierarchical resource allocation. Our construction converts any mutual exclusion algorithm into a valid dining algorithm. Second, we consider the fault-tolerant mutual exclusion algorithm (FTME) of Delporte-Gallet, et al., which uses T and the strong detector S to mask crash faults in any environment. We instantiate our dining construction with FTME, and prove that the resulting dining algorithm guarantees masking tolerance to crash faults. Our contribution (1) defines a new construction for transforming mutual exclusion algorithms into dining algorithms, and (2) demonstrates a better upper-bound on the fault-detection capabilities necessary to mask crash faults in dining philosophers

    A Prescription for Partial Synchrony

    Get PDF
    Algorithms in message-passing distributed systems often require partial synchrony to tolerate crash failures. Informally, partial synchrony refers to systems where timing bounds on communication and computation may exist, but the knowledge of such bounds is limited. Traditionally, the foundation for the theory of partial synchrony has been real time: a time base measured by counting events external to the system, like the vibrations of Cesium atoms or piezoelectric crystals. Unfortunately, algorithms that are correct relative to many real-time based models of partial synchrony may not behave correctly in empirical distributed systems. For example, a set of popular theoretical models, which we call M_*, assume (eventual) upper bounds on message delay and relative process speeds, regardless of message size and absolute process speeds. Empirical systems with bounded channel capacity and bandwidth cannot realize such assumptions either natively, or through algorithmic constructions. Consequently, empirical deployment of the many M_*-based algorithms risks anomalous behavior. As a result, we argue that real time is the wrong basis for such a theory. Instead, the appropriate foundation for partial synchrony is fairness: a time base measured by counting events internal to the system, like the steps executed by the processes. By way of example, we redefine M_* models with fairness-based bounds and provide algorithmic techniques to implement fairness-based M_* models on a significant subset of the empirical systems. The proposed techniques use failure detectors — system services that provide hints about process crashes — as intermediaries that preserve the fairness constraints native to empirical systems. In effect, algorithms that are correct in M_* models are now proved correct in such empirical systems as well. Demonstrating our results requires solving three open problems. (1) We propose the first unified mathematical framework based on Timed I/O Automata to specify empirical systems, partially synchronous systems, and algorithms that execute within the aforementioned systems. (2) We show that crash tolerance capabilities of popular distributed systems can be denominated exclusively through fairness constraints. (3) We specify exemplar system models that identify the set of weakest system models to implement popular failure detectors

    Information Infrastructures in Distributed Environments: Algorithms for Mobile Networks and Resource Allocation

    Get PDF
    A distributed system is a collection of computing entities that communicate with each other to solve some problem. Distributed systems impact almost every aspect of daily life (e.g., cellular networks and the Internet); however, it is hard to develop services on top of distributed systems due to the unreliable nature of computing entities and communication. As handheld devices with wireless communication capabilities become increasingly popular, the task of providing services becomes even more challenging since dynamics, such as mobility, may cause the network topology to change frequently. One way to ease this task is to develop collections of information infrastructures which can serve as building blocks to design more complicated services and can be analyzed independently. The first part of the dissertation considers the dining philosophers problem (a generalization of the mutual exclusion problem) in static networks. A solution to the dining philosophers problem can be utilized when there is a need to prevent multiple nodes from accessing some shared resource simultaneously. We present two algorithms that solve the dining philosophers problem. The first algorithm considers an asynchronous message-passing model while the second one considers an asynchronous shared-memory model. Both algorithms are crash fault-tolerant in the sense that a node crash only affects its local neighborhood in the network. We utilize failure detectors (system services that provide some information about crash failures in the system) to achieve such crash fault-tolerance. In addition to crash fault-tolerance, the first algorithm provides fairness in accessing shared resources and the second algorithm tolerates transient failures (unexpected corruptions to the system state). Considering the message-passing model, we also provide a reduction such that given a crash fault-tolerant solution to our dining philosophers problem, we implement the failure detector that we have utilized to solve our dining philosophers problem. This reduction serves as the first step towards identifying the minimum information regarding crash failures that is required to solve the dining philosophers problem at hand. In the second part of this dissertation, we present information infrastructures for mobile ad hoc networks. In particular, we present solutions to the following problems in mobile ad hoc environments: (1) maintaining neighbor knowledge, (2) neighbor detection, and (3) leader election. The solutions to (1) and (3) consider a system with perfectly synchronized clocks while the solution to (2) considers a system with bounded clock drift. Services such as neighbor detection and maintaining neighbor knowledge can serve as a building block for applications that require point-to-point communication. A solution to the leader election problem can be used whenever there is a need for a unique coordinator in the system to perform a special task
    • …
    corecore