5 research outputs found

    The Failure Detector Abstraction

    Full text link
    This paper surveys the failure detector concept through two dimensions. First we study failure detectors as building blocks to simplify the design of reliable distributed algorithms. More specifically, we illustrate how failure detectors can factor out timing assumptions to detect failures in distributed agreement algorithms. Second, we study failure detectors as computability benchmarks. That is, we survey the weakest failure detector question and illustrate how failure detectors can be used to classify problems. We also highlights some limitations of the failure detector abstraction along each of the dimensions

    The Failure Detector Abstraction

    Get PDF
    A failure detector is a fundamental abstraction in distributed computing. This paper surveys this abstraction through two dimensions. First we study failure detectors as building blocks to simplify the design of reliable distributed algorithms. In particular, we illustrate how failure detectors can factor out timing assumptions to detect failures in distributed agreement algorithms. Second, we study failure detectors as computability benchmarks. That is, we survey the weakest failure detector question and illustrate how failure detectors can be used to classify problems. We also highlight some limitations of the failure detector abstraction along each of the dimensions

    The Weakest Failure Detector for Solving Wait-Free, Eventually Bounded-Fair Dining Philosophers

    Get PDF
    This dissertation explores the necessary and sufficient conditions to solve a variant of the dining philosophers problem. This dining variant is defined by three properties: wait-freedom, eventual weak exclusion, and eventual bounded fairness. Wait-freedom guarantees that every correct hungry process eventually enters its critical section, regardless of process crashes. Eventual weak exclusion guarantees that every execution has an infinite suffix during which no two live neighbors execute overlapping critical sections. Eventual bounded fairness guarantees that there exists a fairness bound k such that every execution has an infinite suffix during which no correct hungry process is overtaken more than k times by any neighbor. This dining variant (WF-EBF dining for short) is important for synchronization tasks where eventual safety (i.e., eventual weak exclusion) is sufficient for correctness (e.g., duty-cycle scheduling, self-stabilizing daemons, and contention managers). Unfortunately, it is known that wait-free dining is unsolvable in asynchronous message-passing systems subject to crash faults. To circumvent this impossibility result, it is necessary to assume the existence of bounds on timing properties, such as relative process speeds and message delivery time. As such, it is of interest to characterize the necessary and sufficient timing assumptions to solve WF-EBF dining. We focus on implicit timing assumptions, which can be encapsulated by failure detectors. Failure detectors can be viewed as distributed oracles that can be queried for potentially unreliable information about crash faults. The weakest detector D for WF-EBF dining means that D is both necessary and sufficient. Necessity means that every failure detector that solves WF-EBF dining is at least as strong as D. Sufficiency means that there exists at least one algorithm that solves WF-EBF dining using D. As such, our research goal is to characterize the weakest failure detector to solve WF-EBF dining. We prove that the eventually perfect failure detector 3P is the weakest failure detector for solving WF-EBF dining. 3P eventually suspects crashed processes permanently, but may make mistakes by wrongfully suspecting correct processes finitely many times during any execution. As such, 3P eventually stops suspecting correct processes

    Information Infrastructures in Distributed Environments: Algorithms for Mobile Networks and Resource Allocation

    Get PDF
    A distributed system is a collection of computing entities that communicate with each other to solve some problem. Distributed systems impact almost every aspect of daily life (e.g., cellular networks and the Internet); however, it is hard to develop services on top of distributed systems due to the unreliable nature of computing entities and communication. As handheld devices with wireless communication capabilities become increasingly popular, the task of providing services becomes even more challenging since dynamics, such as mobility, may cause the network topology to change frequently. One way to ease this task is to develop collections of information infrastructures which can serve as building blocks to design more complicated services and can be analyzed independently. The first part of the dissertation considers the dining philosophers problem (a generalization of the mutual exclusion problem) in static networks. A solution to the dining philosophers problem can be utilized when there is a need to prevent multiple nodes from accessing some shared resource simultaneously. We present two algorithms that solve the dining philosophers problem. The first algorithm considers an asynchronous message-passing model while the second one considers an asynchronous shared-memory model. Both algorithms are crash fault-tolerant in the sense that a node crash only affects its local neighborhood in the network. We utilize failure detectors (system services that provide some information about crash failures in the system) to achieve such crash fault-tolerance. In addition to crash fault-tolerance, the first algorithm provides fairness in accessing shared resources and the second algorithm tolerates transient failures (unexpected corruptions to the system state). Considering the message-passing model, we also provide a reduction such that given a crash fault-tolerant solution to our dining philosophers problem, we implement the failure detector that we have utilized to solve our dining philosophers problem. This reduction serves as the first step towards identifying the minimum information regarding crash failures that is required to solve the dining philosophers problem at hand. In the second part of this dissertation, we present information infrastructures for mobile ad hoc networks. In particular, we present solutions to the following problems in mobile ad hoc environments: (1) maintaining neighbor knowledge, (2) neighbor detection, and (3) leader election. The solutions to (1) and (3) consider a system with perfectly synchronized clocks while the solution to (2) considers a system with bounded clock drift. Services such as neighbor detection and maintaining neighbor knowledge can serve as a building block for applications that require point-to-point communication. A solution to the leader election problem can be used whenever there is a need for a unique coordinator in the system to perform a special task

    State Machine Replication:from Analytical Evaluation to High-Performance Paxos

    Get PDF
    Since their invention more than half a century ago, computers have gone from being just an handful of expensive machines each filling an entire room, to being an integral part of almost every aspect of modern life. Nowadays computers are everywhere: in our planes, in our cars, on our desks, in our home appliances, and even in our pockets. This widespread adoption had a profound impact in our world and in our lives, so much that now we rely on them for many important aspects of everyday life, including work, communication, travel, entertainment, and even managing our money. Given our increased reliance on computers, their continuous and correct operation has become essential for modern society. However, individual computers can fail due to a variety of causes and, if nothing is done about it, these failures can easily lead to a disruption of the service provided by computer system. The field of fault tolerance studies this problem, more precisely, it studies how to enable a computer system to continue operation in spite of the failure of individual components. One of the most popular techniques of achieving fault tolerance is software replication, where a service is replicated on an ensemble of machines (replicas) such that if some of these machines fail, the others will continue providing the service. Software replication is widely used because of its generality (can be applied to most services) and its low cost (can use off-the-shelf hardware). This thesis studies a form of software replication, namely, state machine replication, where the service is modeled as a deterministic state machine whose state transitions consist of the execution of client requests. Although state machine replication was first proposed almost 30 years ago, the proliferation of online services during the last years has led to a renewed interest. Online services must be highly available and for that they frequently rely on state machine replication as part of their fault tolerance mechanisms. However, the unprecedented scale of these services, which frequently have hundreds of thousands or even millions of users, leads to a new set performance requirements on state machine replication. This thesis is organized in two parts. The goal of the first part is to study from a theoretical perspective the performance characteristics of the algorithms behind state machine replication and to propose improved variants of such algorithms. The second part looks at the problem from a practical perspective, proposing new techniques to achieve high-throughput and scalability. In the first part, we start with an analytical analysis of the performance of two consensus algorithms, one leader-free (an adaptation of the fast round of Fast Paxos) and another leader-based (an adaptation of classical Paxos). We express these algorithms in the Heard-Of round model and show that using this model it is fairly easy to determine analytically several interesting performance metrics. We then study the performance of round models in general. Round models are perceived as inefficient because in their typical implementation, the real-time duration of rounds is proportional to the (pessimistic) timeouts used on the underlying system. This contrasts with the failure detector or the partial synchronous system models, where algorithms usually progress at the speed of message reception. We show that there is no inherent gap in performance between the models, by proposing a round implementation that during stable periods advances at the speed of message reception. We conclude the first part by presenting a new leader election algorithm that chooses as leader a well-connected process, that is, a process whose time needed to perform a one-to-majority communication round is among the lowest in the system. This is useful mainly in systems where the latency between processes is not homogeneous, because the performance of leader-based algorithms is particularly sensitive to the performance and connectivity of the process acting as a leader. The second part of the thesis studies different approaches to achieve high-throughput with state machine replication. To support the experimental work done in this part, we have developed JPaxos, a fully-featured implementation of Paxos in Java. We start by looking at how to tune the batching and pipelining optimizations of Paxos; using an analytical model of the performance of Paxos we show how to derive good values for the bounds on the batch size and number of parallel instances. We then propose an architecture for implementing replicated state machines that is capable of leveraging multi-core CPUs to achieve very high-levels of performance. The final contribution of this thesis is based on the observation that most implementations of state machine replication have an unbalanced division of work among threads, with one replica, the leader, having a significantly higher workload than the other replicas. Naturally, the leader becomes the bottleneck of the system, while other replicas are only lightly loaded. We propose and evaluate S-Paxos, which evenly balances the workload among all replicas, and thus overcomes the leader bottleneck. The benefits are two-fold: S-Paxos achieves a higher throughput for a given number of replicas and its performance increases with the number of replicas (up to a reasonable number)
    corecore