6,961 research outputs found
Total order broadcast for fault tolerant exascale systems
In the process of designing a new fault tolerant run-time for future exascale systems, we discovered that a total order broadcast would be necessary. That is, nodes of a supercomputer should be able to broadcast messages to other nodes even in the face of failures. All messages should be seen in the same order at all nodes.
While this is a well studied problem in distributed systems, few researchers have looked at how to perform total order broadcasts at large scales for data availability. Our experience implementing a published total order broadcast algorithm showed poor scalability at tens of nodes. In this paper we present a novel algorithm for total order broadcast which scales logarithmically in the number of processes and is not delayed by most process failures.
While we are motivated by the needs of our run-time we believe this primitive is of general applicability. Total order broadcasts are used often in datacenter environments and as HPC developers begins to address fault tolerance at the application level we believe they will need similar primitives
Efficient Synchronous Byzantine Consensus
We present new protocols for Byzantine state machine replication and
Byzantine agreement in the synchronous and authenticated setting. The
celebrated PBFT state machine replication protocol tolerates Byzantine
faults in an asynchronous setting using replicas, and has since been
studied or deployed by numerous works. In this work, we improve the Byzantine
fault tolerance threshold to by utilizing a relaxed synchrony
assumption. We present a synchronous state machine replication protocol that
commits a decision every 3 rounds in the common case. The key challenge is to
ensure quorum intersection at one honest replica. Our solution is to rely on
the synchrony assumption to form a post-commit quorum of size , which
intersects at replicas with any pre-commit quorums of size . Our
protocol also solves synchronous authenticated Byzantine agreement in expected
8 rounds. The best previous solution (Katz and Koo, 2006) requires expected 24
rounds. Our protocols may be applied to build Byzantine fault tolerant systems
or improve cryptographic protocols such as cryptocurrencies when synchrony can
be assumed
Multi-party Quantum Computation
We investigate definitions of and protocols for multi-party quantum computing
in the scenario where the secret data are quantum systems. We work in the
quantum information-theoretic model, where no assumptions are made on the
computational power of the adversary. For the slightly weaker task of
verifiable quantum secret sharing, we give a protocol which tolerates any t <
n/4 cheating parties (out of n). This is shown to be optimal. We use this new
tool to establish that any multi-party quantum computation can be securely
performed as long as the number of dishonest players is less than n/6.Comment: Masters Thesis. Based on Joint work with Claude Crepeau and Daniel
Gottesman. Full version is in preparatio
Low cost management of replicated data in fault-tolerant distributed systems
Many distributed systems replicate data for fault tolerance or availability. In such systems, a logical update on a data item results in a physical update on a number of copies. The synchronization and communication required to keep the copies of replicated data consistent introduce a delay when operations are performed. A technique is described that relaxes the usual degree of synchronization, permitting replicated data items to be updated concurrently with other operations, while at the same time ensuring that correctness is not violated. The additional concurrency thus obtained results in better response time when performing operations on replicated data. How this technique performs in conjunction with a roll-back and a roll-forward failure recovery mechanism is also discussed
- …