14 research outputs found

    Supporting fault-tolerant communication in networks

    Get PDF
    We address two problems dealing with fault-tolerant communication in networks. The first one is designing a distributed storage protocol tolerant to Byzantine failure of servers. The protocol implements a multi-writer multi-reader register which satisfies a weaker consistency condition called MWReg. Most of the earlier work gives multiwriter implementations by simulating m copies of a single-writer protocol where m is the number of writers. Our solution gives a direct multi-writer implementation and thus has bounded message and time complexity independent of the number of writers. We have simulated the complete protocol to test its performance and also proved its correctness theoretically. The second problem we address is of providing a reliable communication link between two nodes in a network. We present a capacity reservation algorithm in the case for upper bounds on edge capacities and costs associated with using per unit capacity on any edge. We give a flow based approximation algorithm with cost at most four times optimal. To conclude, we design a distributed storage protocol and a capacity reservation algorithm which are tolerant to network failures

    Stabilizing Byzantine-Fault Tolerant Storage

    Get PDF
    Distributed storage service is one of the main abstractions provided to developers of distributed applications due to its ability to hide the complexity generated by the various messages exchanged between processes. Many protocols have been proposed to build Byzantine-fault-tolerant (BFT) storage services on top of a message-passing system but none of them considers the possibility that well-behaving processes (i.e. correct processes) may experience transient failures due to, say, isolated errors during computation or bit alteration during message transfer. This paper proposes a stabilizing Byzantine-tolerant algorithm for emulating a multi-writer multi-reader regular register abstraction on top of a message passing system with n > 5f servers, which we prove to be the minimal possible number of servers for stabilizing and tolerating f Byzantine servers. That is, each read operation returns the value written by the most recent write and write operations are totally ordered with respect to the happened before relation. Our algorithm is particularly appealing for cloud computing architectures where both processors and memory contents (including stale messages in transit) are prone to errors, faults and malicious behaviors. The proposed implementation extends previous BFT implementations in two ways. First, the algorithm works even when the local memory of processors and the content of the communication channels are initially corrupted in an arbitrary manner. Second, unlike previous solutions, our algorithm uses bounded logical timestamps, a feature difficult to achieve in the presence of transient errors

    Fast Access to Distributed Atomic Memory

    Get PDF
    We study efficient and robust implementations of an atomic read-write data structure over an asynchronous distributed message-passing system made of reader and writer processes, as well as a number of servers implementing the data structure. We determine the exact conditions under which every read and write involves one round of communication with the servers. These conditions relate the number of readers to the tolerated number of faulty servers and the nature of these failures

    Fault-Tolerant Distributed Services in Message-Passing Systems

    Get PDF
    Distributed systems ranging from small local area networks to large wide area networks like the Internet composed of static and/or mobile users have become increasingly popular. A desirable property for any distributed service is fault-tolerance, which means the service remains uninterrupted even if some components in the network fail. This dissertation considers weak distributed models to find either algorithms to solve certain problems or impossibility proofs to show that a problem is unsolvable. These are the main contributions of this dissertation: • Failure detectors are used as a service to solve consensus (agreement among nodes) which is otherwise impossible in failure-prone asynchronous systems. We find an algorithm for crash-failure detection that uses bounded size messages in an arbitrary, partitionable network composed of badly- behaved channels that can lose and reorder messages. • Registers are a fundamental building block for shared memory emulations on top of message passing systems. The problem has been extensively studied in static systems. However, register emulation in dynamic systems with faulty nodes is still quite hard and there are impossibility proofs that point out scenarios where change in the system composition due to nodes entering and leaving (also called churn) makes the problem unsolvable. We propose the first emulation of a crash-fault tolerant register in a system with continuous churn where consensus is unsolvable, the size of the system can grow without bound and at most a constant fraction of the number of nodes in the system can fail by crashing. We prove a lower bound that states that fault-tolerance for dynamic systems with churn is inherently lower than in static systems. • We then extend the results in the crash-fault tolerant case to a dynamic system with continuous churn and nodes that can be Byzantine faulty. It is the first emulation of an atomic register in a system that can withstand nodes continually entering and leaving, imposes no upper bound on the system size and can tolerate Byzantine nodes. However, the number of Byzantine faulty nodes that can be tolerated is upper bounded by a constant number. Although the algorithm requires that there be a constant known upper bound on the number of Byzantine nodes, this restriction is unavoidable, as we show that it is impossible to emulate an atomic register if the system size and maximum number of servers that can be Byzantine in the system is unknown

    Fault-Tolerant Distributed Services in Message-Passing Systems

    Get PDF
    Distributed systems ranging from small local area networks to large wide area networks like the Internet composed of static and/or mobile users have become increasingly popular. A desirable property for any distributed service is fault-tolerance, which means the service remains uninterrupted even if some components in the network fail. This dissertation considers weak distributed models to find either algorithms to solve certain problems or impossibility proofs to show that a problem is unsolvable. These are the main contributions of this dissertation: • Failure detectors are used as a service to solve consensus (agreement among nodes) which is otherwise impossible in failure-prone asynchronous systems. We find an algorithm for crash-failure detection that uses bounded size messages in an arbitrary, partitionable network composed of badly- behaved channels that can lose and reorder messages. • Registers are a fundamental building block for shared memory emulations on top of message passing systems. The problem has been extensively studied in static systems. However, register emulation in dynamic systems with faulty nodes is still quite hard and there are impossibility proofs that point out scenarios where change in the system composition due to nodes entering and leaving (also called churn) makes the problem unsolvable. We propose the first emulation of a crash-fault tolerant register in a system with continuous churn where consensus is unsolvable, the size of the system can grow without bound and at most a constant fraction of the number of nodes in the system can fail by crashing. We prove a lower bound that states that fault-tolerance for dynamic systems with churn is inherently lower than in static systems. • We then extend the results in the crash-fault tolerant case to a dynamic system with continuous churn and nodes that can be Byzantine faulty. It is the first emulation of an atomic register in a system that can withstand nodes continually entering and leaving, imposes no upper bound on the system size and can tolerate Byzantine nodes. However, the number of Byzantine faulty nodes that can be tolerated is upper bounded by a constant number. Although the algorithm requires that there be a constant known upper bound on the number of Byzantine nodes, this restriction is unavoidable, as we show that it is impossible to emulate an atomic register if the system size and maximum number of servers that can be Byzantine in the system is unknown

    Bounded wait-free implementation of optimally resilient byzantine storage without (unproven) cryptographic assumptions

    No full text
    corecore