24,920 research outputs found

    The Impact of RDMA on Agreement

    Full text link
    Remote Direct Memory Access (RDMA) is becoming widely available in data centers. This technology allows a process to directly read and write the memory of a remote host, with a mechanism to control access permissions. In this paper, we study the fundamental power of these capabilities. We consider the well-known problem of achieving consensus despite failures, and find that RDMA can improve the inherent trade-off in distributed computing between failure resilience and performance. Specifically, we show that RDMA allows algorithms that simultaneously achieve high resilience and high performance, while traditional algorithms had to choose one or another. With Byzantine failures, we give an algorithm that only requires n2fP+1n \geq 2f_P + 1 processes (where fPf_P is the maximum number of faulty processes) and decides in two (network) delays in common executions. With crash failures, we give an algorithm that only requires nfP+1n \geq f_P + 1 processes and also decides in two delays. Both algorithms tolerate a minority of memory failures inherent to RDMA, and they provide safety in asynchronous systems and liveness with standard additional assumptions.Comment: Full version of PODC'19 paper, strengthened broadcast algorith

    FastPay: High-Performance Byzantine Fault Tolerant Settlement

    Get PDF
    FastPay allows a set of distributed authorities, some of which are Byzantine, to maintain a high-integrity and availability settlement system for pre-funded payments. It can be used to settle payments in a native unit of value (crypto-currency), or as a financial side-infrastructure to support retail payments in fiat currencies. FastPay is based on Byzantine Consistent Broadcast as its core primitive, foregoing the expenses of full atomic commit channels (consensus). The resulting system has low-latency for both confirmation and payment finality. Remarkably, each authority can be sharded across many machines to allow unbounded horizontal scalability. Our experiments demonstrate intra-continental confirmation latency of less than 100ms, making FastPay applicable to point of sale payments. In laboratory environments, we achieve over 80,000 transactions per second with 20 authorities---surpassing the requirements of current retail card payment networks, while significantly increasing their robustness

    Generalized Paxos Made Byzantine (and Less Complex)

    Full text link
    One of the most recent members of the Paxos family of protocols is Generalized Paxos. This variant of Paxos has the characteristic that it departs from the original specification of consensus, allowing for a weaker safety condition where different processes can have a different views on a sequence being agreed upon. However, much like the original Paxos counterpart, Generalized Paxos does not have a simple implementation. Furthermore, with the recent practical adoption of Byzantine fault tolerant protocols, it is timely and important to understand how Generalized Paxos can be implemented in the Byzantine model. In this paper, we make two main contributions. First, we provide a description of Generalized Paxos that is easier to understand, based on a simpler specification and the pseudocode for a solution that can be readily implemented. Second, we extend the protocol to the Byzantine fault model

    Self-Healing Computation

    Full text link
    In the problem of reliable multiparty computation (RC), there are nn parties, each with an individual input, and the parties want to jointly compute a function ff over nn inputs. The problem is complicated by the fact that an omniscient adversary controls a hidden fraction of the parties. We describe a self-healing algorithm for this problem. In particular, for a fixed function ff, with nn parties and mm gates, we describe how to perform RC repeatedly as the inputs to ff change. Our algorithm maintains the following properties, even when an adversary controls up to t(14ϵ)nt \leq (\frac{1}{4} - \epsilon) n parties, for any constant ϵ>0\epsilon >0. First, our algorithm performs each reliable computation with the following amortized resource costs: O(m+nlogn)O(m + n \log n) messages, O(m+nlogn)O(m + n \log n) computational operations, and O()O(\ell) latency, where \ell is the depth of the circuit that computes ff. Second, the expected total number of corruptions is O(t(logm)2)O(t (\log^{*} m)^2), after which the adversarially controlled parties are effectively quarantined so that they cause no more corruptions.Comment: 17 pages and 1 figure. It is submitted to SSS'1
    corecore