166 research outputs found

    Consensus in Byzantine asynchronous systems

    Get PDF
    AbstractThis paper presents a consensus protocol resilient to Byzantine failures. It uses signed and certified messages and is based on two underlying failure detection modules. The first is a muteness failure detection module of the class ♢M. The second is a reliable Byzantine behaviour detection module. More precisely, the first module detects processes that stop sending messages, while processes experiencing other non-correct behaviours (i.e., Byzantine) are detected by the second module. The protocol is resilient to F faulty processes, F⩽min(⌊(n−1)/2⌋,C) (where C is the maximum number of faulty processes that can be tolerated by the underlying certification service).The approach used to design the protocol is new. While usual Byzantine consensus protocols are based on failure detectors to detect processes that stop communicating, none of them use a module to detect their Byzantine behaviour (this detection is not isolated from the protocol and makes it difficult to understand and prove correct). In addition to this modular approach and to a consensus protocol for Byzantine systems, the paper presents a finite state automaton-based implementation of the Byzantine behaviour detection module. Finally, the modular approach followed in this paper can be used to solve other problems in asynchronous systems experiencing Byzantine failures

    The Failure Detector Abstraction

    Full text link
    This paper surveys the failure detector concept through two dimensions. First we study failure detectors as building blocks to simplify the design of reliable distributed algorithms. More specifically, we illustrate how failure detectors can factor out timing assumptions to detect failures in distributed agreement algorithms. Second, we study failure detectors as computability benchmarks. That is, we survey the weakest failure detector question and illustrate how failure detectors can be used to classify problems. We also highlights some limitations of the failure detector abstraction along each of the dimensions

    Using Oracle to Solve ZooKeeper on Two-Replica Problems

    Get PDF
    The project introduces an Oracle, a failure detector, in Apache ZooKeeper and makes it fault-tolerant in a two-node system. The project demonstrates the Oracle authorizes the primary process to maintain the liveness when the majority’s rule becomes an obstacle to continue Apache ZooKeeper service. In addition to the property of accuracy and completeness from Chandra et al.’s research, the project proposes the property of see to avoid losing transactions and the property of mutual exclusion to avoid split-brain issues. The hybrid properties render not only more sounder flexibility in the implementation but also stronger guarantees on safety. Thus, the Oracle complements Apache ZooKeeper’s availability

    Transforming Asynchronous Systems with Crash-Stop Failures and Failure Detectors to the General Omission Model

    Get PDF
    This paper studies the impact of omission failures on asynchronous distributed s ystems with crash-stop failures. For the large group of problem specifications that are restricted to correct processes, we show how to transform a crash-stop related problem specification into an equivalent omission one. For that, we provide transformations for algorithms and failure detectors, such that if and only if an algorithm using a failure detector satisfies a problem specification, then the transformed algorithm using the transformed failure detector satisfies the transformed problem specification. Our transformed problem specification is ensured to be non-trivial, and moreover, the transformation reveals itself to be in a reasonable sense weakest failure detector preserving. Our results help to use the power of the well-understood crash-stop model to aut omatically derive solutions for the general omission model, which has recently raised interest for being noticeably applicable for security problems in distributed environments equipped with security modules such as smartcards

    Dependability enhancing mechanisms for integrated clinical environments

    Get PDF
    In this article, we present a set of lightweight mechanisms to enhance the dependability of a safety-critical real-time distributed system referred to as an integrated clinical environment (ICE). In an ICE, medical devices are interconnected and work together with the help of a supervisory computer system to enhance patient safety during clinical operations. Inevitably, there are strong dependability requirements on the ICE. We introduce a set of mechanisms that essentially make the supervisor component a trusted computing base, which can withstand common hardware failures and malicious attacks. The mechanisms rely on the replication of the supervisor component and employ only one input-exchange phase into the critical path of the operation of the ICE. Our analysis shows that the runtime latency overhead is much lower than that of traditional approaches
    • …
    corecore