3 research outputs found

    Prime: Byzantine Replication under Attack

    Full text link

    An Efficient Fault-Tolerant method for Distributed Computation Systems

    Get PDF
    Fault tolerance is one of the most important features required by many distributed systems. We consider the efficiency issues of constructing distributed computing systems that can tolerate Byzantine faults. The well-recognized technique is to introduce replicated computation and derive the correct results through a voting mechanism. While this technique is applied to each computation request individually, we believe that by considering multiple requests at the same time in a distributed environment, we can greatly improve its efficiency. This is based on the observations that computation requests may be ordered in a different way for computation at different nodes, and the verdict of the correct result for one request may imply the correct result for another request. We propose to exploit a suitable solution to improve the efficiency of the existing technique to avoid unnecessary computation and unnecessary message exchanges among distributed processes

    Customizable fault tolerance for wide-area replication

    No full text
    Constructing logical machines out of collections of physical machines is a well-known technique for improving the robustness and fault tolerance of distributed systems. We present a new, scalable replication architecture, built upon logical machines specifically designed to perform well in wide-area systems spanning multiple sites. The physical machines in each site implement a logical machine by running a local state machine replication protocol, and a wide-area replication protocol runs among the logical machines. Implementing logical machines via the state machine approach affords free substitution of the fault tolerance method used in each site and in the wide-area replication protocol, allowing one to balance performance and fault tolerance based on perceived risk. We present a new Byzantine fault-tolerant protocol that establishes a reliable virtual communication link between logical machines. Our communication protocol is efficient (a necessity in wide-area environments), avoiding the need for redundant message sending during normal-case operation and allowing a logical machine to consume approximately the same wide-area bandwidth as a single physical machine. This dramatically improves the wide-area performance of our system compared to existing logical machine based approaches. We implemented a prototype system and compare its performance and fault tolerance to existing solutions.
    corecore