6,961 research outputs found

    Total order broadcast for fault tolerant exascale systems

    Full text link
    In the process of designing a new fault tolerant run-time for future exascale systems, we discovered that a total order broadcast would be necessary. That is, nodes of a supercomputer should be able to broadcast messages to other nodes even in the face of failures. All messages should be seen in the same order at all nodes. While this is a well studied problem in distributed systems, few researchers have looked at how to perform total order broadcasts at large scales for data availability. Our experience implementing a published total order broadcast algorithm showed poor scalability at tens of nodes. In this paper we present a novel algorithm for total order broadcast which scales logarithmically in the number of processes and is not delayed by most process failures. While we are motivated by the needs of our run-time we believe this primitive is of general applicability. Total order broadcasts are used often in datacenter environments and as HPC developers begins to address fault tolerance at the application level we believe they will need similar primitives

    Efficient Synchronous Byzantine Consensus

    Get PDF
    We present new protocols for Byzantine state machine replication and Byzantine agreement in the synchronous and authenticated setting. The celebrated PBFT state machine replication protocol tolerates ff Byzantine faults in an asynchronous setting using 3f+13f+1 replicas, and has since been studied or deployed by numerous works. In this work, we improve the Byzantine fault tolerance threshold to n=2f+1n=2f+1 by utilizing a relaxed synchrony assumption. We present a synchronous state machine replication protocol that commits a decision every 3 rounds in the common case. The key challenge is to ensure quorum intersection at one honest replica. Our solution is to rely on the synchrony assumption to form a post-commit quorum of size 2f+12f+1, which intersects at f+1f+1 replicas with any pre-commit quorums of size f+1f+1. Our protocol also solves synchronous authenticated Byzantine agreement in expected 8 rounds. The best previous solution (Katz and Koo, 2006) requires expected 24 rounds. Our protocols may be applied to build Byzantine fault tolerant systems or improve cryptographic protocols such as cryptocurrencies when synchrony can be assumed

    Multi-party Quantum Computation

    Get PDF
    We investigate definitions of and protocols for multi-party quantum computing in the scenario where the secret data are quantum systems. We work in the quantum information-theoretic model, where no assumptions are made on the computational power of the adversary. For the slightly weaker task of verifiable quantum secret sharing, we give a protocol which tolerates any t < n/4 cheating parties (out of n). This is shown to be optimal. We use this new tool to establish that any multi-party quantum computation can be securely performed as long as the number of dishonest players is less than n/6.Comment: Masters Thesis. Based on Joint work with Claude Crepeau and Daniel Gottesman. Full version is in preparatio

    Low cost management of replicated data in fault-tolerant distributed systems

    Get PDF
    Many distributed systems replicate data for fault tolerance or availability. In such systems, a logical update on a data item results in a physical update on a number of copies. The synchronization and communication required to keep the copies of replicated data consistent introduce a delay when operations are performed. A technique is described that relaxes the usual degree of synchronization, permitting replicated data items to be updated concurrently with other operations, while at the same time ensuring that correctness is not violated. The additional concurrency thus obtained results in better response time when performing operations on replicated data. How this technique performs in conjunction with a roll-back and a roll-forward failure recovery mechanism is also discussed
    corecore