532 research outputs found

    Byzantine Fault Tolerance for Nondeterministic Applications

    Full text link
    All practical applications contain some degree of nondeterminism. When such applications are replicated to achieve Byzantine fault tolerance (BFT), their nondeterministic operations must be controlled to ensure replica consistency. To the best of our knowledge, only the most simplistic types of replica nondeterminism have been dealt with. Furthermore, there lacks a systematic approach to handling common types of nondeterminism. In this paper, we propose a classification of common types of replica nondeterminism with respect to the requirement of achieving Byzantine fault tolerance, and describe the design and implementation of the core mechanisms necessary to handle such nondeterminism within a Byzantine fault tolerance framework.Comment: To appear in the proceedings of the 3rd IEEE International Symposium on Dependable, Autonomic and Secure Computing, 200

    Replica determinism and flexible scheduling in hard real-time dependable systems

    Get PDF
    Fault-tolerant real-time systems are typically based on active replication where replicated entities are required to deliver their outputs in an identical order within a given time interval. Distributed scheduling of replicated tasks, however, violates this requirement if on-line scheduling, preemptive scheduling, or scheduling of dissimilar replicated task sets is employed. This problem of inconsistent task outputs has been solved previously by coordinating the decisions of the local schedulers such that replicated tasks are executed in an identical order. Global coordination results either in an extremely high communication effort to agree on each schedule decision or in an overly restrictive execution model where on-line scheduling, arbitrary preemptions, and nonidentically replicated task sets are not allowed. To overcome these restrictions, a new method, called timed messages, is introduced. Timed messages guarantee deterministic operation by presenting consistent message versions to the replicated tasks. This approach is based on simulated common knowledge and a sparse time base. Timed messages are very effective since they neither require communication between the local scheduler nor do they restrict usage of on-line flexible scheduling, preemptions and nonidentically replicated task sets

    Byzantine Fault Tolerance for Nondeterministic Applications

    Get PDF
    The growing reliance on online services accessible on the Internet demands highly reliable system that would not be interrupted when encountering faults. A number of Byzantine fault tolerance (BFT) algorithms have been developed to mask the most complicated type of faults - Byzantine faults such as software bugs,operator mistakes, and malicious attacks, which are usually the major cause of service interruptions. However, it is often difficult to apply these algorithms to practical applications because such applications often exhibit sophisticated non-deterministic behaviors that the existing BFT algorithms could not cope with. In this thesis, we propose a classification of common types of replica nondeterminism with respect to the requirement of achieving Byzantine fault tolerance, and describe the design and implementation of the core mechanisms necessary to handle such replica nondeterminism within a Byzantine fault tolerance framework. In addition, we evaluated the performance of our BFT library, referred to as ND-BFT using both a micro-benchmark application and a more realistic online porker game application. The performance results show that the replicated online poker game performs approximately 13 slower than its nonreplicated counterpart in the presence of small number of player

    Byzantine Fault Tolerance for Nondeterministic Applications

    Get PDF
    The growing reliance on online services accessible on the Internet demands highly reliable system that would not be interrupted when encountering faults. A number of Byzantine fault tolerance (BFT) algorithms have been developed to mask the most complicated type of faults - Byzantine faults such as software bugs,operator mistakes, and malicious attacks, which are usually the major cause of service interruptions. However, it is often difficult to apply these algorithms to practical applications because such applications often exhibit sophisticated non-deterministic behaviors that the existing BFT algorithms could not cope with. In this thesis, we propose a classification of common types of replica nondeterminism with respect to the requirement of achieving Byzantine fault tolerance, and describe the design and implementation of the core mechanisms necessary to handle such replica nondeterminism within a Byzantine fault tolerance framework. In addition, we evaluated the performance of our BFT library, referred to as ND-BFT using both a micro-benchmark application and a more realistic online porker game application. The performance results show that the replicated online poker game performs approximately 13 slower than its nonreplicated counterpart in the presence of small number of player

    Byzantine Fault Tolerance for Nondeterministic Applications

    Get PDF
    The growing reliance on online services accessible on the Internet demands highly reliable system that would not be interrupted when encountering faults. A number of Byzantine fault tolerance (BFT) algorithms have been developed to mask the most complicated type of faults - Byzantine faults such as software bugs,operator mistakes, and malicious attacks, which are usually the major cause of service interruptions. However, it is often difficult to apply these algorithms to practical applications because such applications often exhibit sophisticated non-deterministic behaviors that the existing BFT algorithms could not cope with. In this thesis, we propose a classification of common types of replica nondeterminism with respect to the requirement of achieving Byzantine fault tolerance, and describe the design and implementation of the core mechanisms necessary to handle such replica nondeterminism within a Byzantine fault tolerance framework. In addition, we evaluated the performance of our BFT library, referred to as ND-BFT using both a micro-benchmark application and a more realistic online porker game application. The performance results show that the replicated online poker game performs approximately 13 slower than its nonreplicated counterpart in the presence of small number of player

    Application Agreement and Integration Services

    Get PDF
    Application agreement and integration services are required by distributed, fault-tolerant, safety critical systems to assure required performance. An analysis of distributed and hierarchical agreement strategies are developed against the backdrop of observed agreement failures in fielded systems. The documented work was performed under NASA Task Order NNL10AB32T, Validation And Verification of Safety-Critical Integrated Distributed Systems Area 2. This document is intended to satisfy the requirements for deliverable 5.2.11 under Task 4.2.2.3. This report discusses the challenges of maintaining application agreement and integration services. A literature search is presented that documents previous work in the area of replica determinism. Sources of non-deterministic behavior are identified and examples are presented where system level agreement failed to be achieved. We then explore how TTEthernet services can be extended to supply some interesting application agreement frameworks. This document assumes that the reader is familiar with the TTEthernet protocol. The reader is advised to read the TTEthernet protocol standard [1] before reading this document. This document does not re-iterate the content of the standard

    Fault Tolerant Adaptive Parallel and Distributed Simulation through Functional Replication

    Full text link
    This paper presents FT-GAIA, a software-based fault-tolerant parallel and distributed simulation middleware. FT-GAIA has being designed to reliably handle Parallel And Distributed Simulation (PADS) models, which are needed to properly simulate and analyze complex systems arising in any kind of scientific or engineering field. PADS takes advantage of multiple execution units run in multicore processors, cluster of workstations or HPC systems. However, large computing systems, such as HPC systems that include hundreds of thousands of computing nodes, have to handle frequent failures of some components. To cope with this issue, FT-GAIA transparently replicates simulation entities and distributes them on multiple execution nodes. This allows the simulation to tolerate crash-failures of computing nodes. Moreover, FT-GAIA offers some protection against Byzantine failures, since interaction messages among the simulated entities are replicated as well, so that the receiving entity can identify and discard corrupted messages. Results from an analytical model and from an experimental evaluation show that FT-GAIA provides a high degree of fault tolerance, at the cost of a moderate increase in the computational load of the execution units.Comment: arXiv admin note: substantial text overlap with arXiv:1606.0731

    Optimistic Parallel State-Machine Replication

    Full text link
    State-machine replication, a fundamental approach to fault tolerance, requires replicas to execute commands deterministically, which usually results in sequential execution of commands. Sequential execution limits performance and underuses servers, which are increasingly parallel (i.e., multicore). To narrow the gap between state-machine replication requirements and the characteristics of modern servers, researchers have recently come up with alternative execution models. This paper surveys existing approaches to parallel state-machine replication and proposes a novel optimistic protocol that inherits the scalable features of previous techniques. Using a replicated B+-tree service, we demonstrate in the paper that our protocol outperforms the most efficient techniques by a factor of 2.4 times
    • …
    corecore