102 research outputs found

    Supporting fault-tolerant communication in networks

    Get PDF
    We address two problems dealing with fault-tolerant communication in networks. The first one is designing a distributed storage protocol tolerant to Byzantine failure of servers. The protocol implements a multi-writer multi-reader register which satisfies a weaker consistency condition called MWReg. Most of the earlier work gives multiwriter implementations by simulating m copies of a single-writer protocol where m is the number of writers. Our solution gives a direct multi-writer implementation and thus has bounded message and time complexity independent of the number of writers. We have simulated the complete protocol to test its performance and also proved its correctness theoretically. The second problem we address is of providing a reliable communication link between two nodes in a network. We present a capacity reservation algorithm in the case for upper bounds on edge capacities and costs associated with using per unit capacity on any edge. We give a flow based approximation algorithm with cost at most four times optimal. To conclude, we design a distributed storage protocol and a capacity reservation algorithm which are tolerant to network failures

    06371 Abstracts Collection -- From Security to Dependability

    Get PDF
    From 10.09.06 to 15.09.06, the Dagstuhl Seminar 06371 ``From Security to Dependability\u27\u27 was held in the International Conference and Research Center (IBFI), Schloss Dagstuhl. During the seminar, several participants presented their current research, and ongoing work and open problems were discussed. Abstracts of the presentations given during the seminar as well as abstracts of seminar results and ideas are put together in this paper. The first section describes the seminar topics and goals in general. Links to extended abstracts or full papers are provided, if available

    Self-Stabilizing and Private Distributed Shared Atomic Memory in Seldomly Fair Message Passing Networks

    Get PDF
    We study the problem of privately emulating shared memory in message-passing networks. The system includes clients that store and retrieve replicated information on N servers, out of which e are data-corrupting malicious. When a client accesses a data-corrupting malicious server, the data field of that server response might be different from the value it originally stored. However, all other control variables in the server reply and protocol actions are according to the server algorithm. For the coded atomic storage algorithms by Cadambe et al., we present an enhancement that ensures no information leakage and data-corrupting malicious fault-tolerance. We also consider recovery after the occurrence of transient faults that violate the assumptions according to which the system was designed to operate. After their last occurrence, transient faults leave the system in an arbitrary state (while the program code stays intact). We present a self-stabilizing algorithm, which recovers after the occurrence of transient faults. This addition to Cadambe et al. considers asynchronous settings as long as no transient faults occur. The recovery from transient faults that bring the system counters (close) to their maximal values may include the use of a global reset procedure, which requires the system run to be controlled by a fair scheduler. After the recovery period, the safety properties are provided for asynchronous system runs that are not necessarily controlled by fair schedulers. Since the recovery period is bounded and the occurrence of transient faults is extremely rare, we call this design criteria self-stabilization in the presence of seldom fairness. Our self-stabilizing algorithm uses a bounded amount of storage during asynchronous executions (that are not necessarily controlled by fair schedulers). To the best of our knowledge, we are the first to address privacy, data-corrupting malicious behavior, and self-stabilization in the context of emulating atomic shared memory in message-passing systems

    Stabilizing Byzantine-Fault Tolerant Storage

    Get PDF
    Distributed storage service is one of the main abstractions provided to developers of distributed applications due to its ability to hide the complexity generated by the various messages exchanged between processes. Many protocols have been proposed to build Byzantine-fault-tolerant (BFT) storage services on top of a message-passing system but none of them considers the possibility that well-behaving processes (i.e. correct processes) may experience transient failures due to, say, isolated errors during computation or bit alteration during message transfer. This paper proposes a stabilizing Byzantine-tolerant algorithm for emulating a multi-writer multi-reader regular register abstraction on top of a message passing system with n > 5f servers, which we prove to be the minimal possible number of servers for stabilizing and tolerating f Byzantine servers. That is, each read operation returns the value written by the most recent write and write operations are totally ordered with respect to the happened before relation. Our algorithm is particularly appealing for cloud computing architectures where both processors and memory contents (including stale messages in transit) are prone to errors, faults and malicious behaviors. The proposed implementation extends previous BFT implementations in two ways. First, the algorithm works even when the local memory of processors and the content of the communication channels are initially corrupted in an arbitrary manner. Second, unlike previous solutions, our algorithm uses bounded logical timestamps, a feature difficult to achieve in the presence of transient errors

    Optimal self-stabilizing mobile byzantine-tolerant regular register with bounded timestamps

    Get PDF
    This paper proposes the first implementation of a self-stabilizing regular register emulated by n servers that is tolerant to both Mobile Byzantine Agents and transient failures in a round-free synchronous model. Differently from existing Mobile Byzantine Tolerant register implementations, this paper considers a weaker model where: (i) the computation of the servers is decoupled from the movements of the Byzantine agents, i.e., movements may happen before, concurrently, or after the generation or the delivery of a message, and (ii) servers are not aware of their failure state i.e., they do not know if and when they have been corrupted by a Mobile Byzantine agent. The proposed protocol tolerates (i) any finite number of transient failures, and (ii) up to f Mobile Byzantine agents. In addition, our implementation uses bounded timestamps from the Z13 domain and it is optimal with respect to the number of servers needed to tolerate f Mobile Byzantine agents in the given model (i.e., n>6f when Δ=2δ, and n>8f when Δ=δ, where Δ represents the period at which the Byzantine agents move and δ is the upper bound on the communication latency)

    Fault-Tolerant Distributed Services in Message-Passing Systems

    Get PDF
    Distributed systems ranging from small local area networks to large wide area networks like the Internet composed of static and/or mobile users have become increasingly popular. A desirable property for any distributed service is fault-tolerance, which means the service remains uninterrupted even if some components in the network fail. This dissertation considers weak distributed models to find either algorithms to solve certain problems or impossibility proofs to show that a problem is unsolvable. These are the main contributions of this dissertation: • Failure detectors are used as a service to solve consensus (agreement among nodes) which is otherwise impossible in failure-prone asynchronous systems. We find an algorithm for crash-failure detection that uses bounded size messages in an arbitrary, partitionable network composed of badly- behaved channels that can lose and reorder messages. • Registers are a fundamental building block for shared memory emulations on top of message passing systems. The problem has been extensively studied in static systems. However, register emulation in dynamic systems with faulty nodes is still quite hard and there are impossibility proofs that point out scenarios where change in the system composition due to nodes entering and leaving (also called churn) makes the problem unsolvable. We propose the first emulation of a crash-fault tolerant register in a system with continuous churn where consensus is unsolvable, the size of the system can grow without bound and at most a constant fraction of the number of nodes in the system can fail by crashing. We prove a lower bound that states that fault-tolerance for dynamic systems with churn is inherently lower than in static systems. • We then extend the results in the crash-fault tolerant case to a dynamic system with continuous churn and nodes that can be Byzantine faulty. It is the first emulation of an atomic register in a system that can withstand nodes continually entering and leaving, imposes no upper bound on the system size and can tolerate Byzantine nodes. However, the number of Byzantine faulty nodes that can be tolerated is upper bounded by a constant number. Although the algorithm requires that there be a constant known upper bound on the number of Byzantine nodes, this restriction is unavoidable, as we show that it is impossible to emulate an atomic register if the system size and maximum number of servers that can be Byzantine in the system is unknown

    Abstractions for asynchronous distributed computing with malicious players

    Get PDF
    In modern distributed systems, failures are the norm rather than the exception. In many cases, these failures are not benign. Settings such as the Internet might incur malicious (also called Byzantine or arbitrary) behavior and asynchrony. As a result, and perhaps not surprisingly, research on asynchronous Byzantine fault-tolerant (BFT) distributed systems is flourishing. Tolerating arbitrary behavior and asynchrony calls for very sophisticated algorithms. This is in particular the case with BFT solutions that aim to provide properties such as: (a) optimal resilience, i.e., tolerating as many Byzantine failures as possible and (b) optimal performance with respect to some relevant complexity metric. Most BFT algorithms are built from scratch or by modifying existing solutions in a non-modular manner, which often renders these algorithms difficult to understand and, consequently, impedes their wider adoption. We attribute this complexity to the lack of sufficient number of adequate abstractions for asynchronous BFT distributed computing. The motivation of this thesis is to propose reusable abstractions for devising asynchronous BFT distributed algorithms that are optimally resilient and/or have optimal complexity, with strong focus on one of the most important complexity metrics — time complexity (or latency). The abstractions proposed in this thesis are devised with three fundamental distributed applications in mind: (a) read/write storage (also called register), (b) consensus and (c) state machine replication (SMR). We demonstrate how to use our abstractions in these applications to devise asynchronous BFT algorithms that feature the best complexity among all algorithms we know of, in addition to optimal resilience. First, we introduce the notion of a refined quorum system (RQS) of some set S as a set of three classes of subsets (quorums) of S: first class quorums are also second class quorums, themselves being also third class quorums. First class quorums have large intersections with all other quorums, second class quorums typically have smaller intersections with those of the third class, the latter simply correspond to traditional quorums. The refined quorum system abstraction helps design algorithms that tolerate contention (process concurrency), arbitrarily long periods of asynchrony and the largest possible number of failures, but perform fast if few failures occur, the system is synchronous and there is no contention, i.e., under conditions that are assumed to be frequent in practice. In other words, RQS helps combine optimal resilience and optimal best-case time complexity. Intuitively, under uncontended and synchronous conditions, a distributed object implementation would expedite an operation if a quorum of the first class is accessed, then degrade gracefully depending on whether a quorum of the second or the third class is accessed. Our notion of RQS is devised assuming a general adversary structure, and this basically allows algorithms relying on RQS to relax the assumption of independent process failures. We illustrate the power of refined quorums by introducing two new optimal BFT atomic object implementations: an atomic storage and consensus algorithm. Our second abstraction is a novel timestamping mechanism called high resolution timestamps (HRts), which can be seen as a variation of a matrix clocks. Roughly speaking, a high resolution timestamp contains a matrix of local timestamps of (a subset of) processes as seen by (a subset of) other processes. Complementary to RQS, HRts simplify the design of BFT distributed algorithms that combine optimal resilience and worst-case time complexity. We apply high-resolution timestamps to design read/write storage algorithms in which HRts are used to detect and filter out Byzantine processes, which paves the path to the first BFT storage algorithms that combine optimal resilience with optimal worst-case time complexity. Finally, we introduce ABsTRACT (Abortable Byzantine faulT-toleRant stAte maChine replicaTion), a generic abstraction that simplifies the notoriously difficult task of developing BFT state machine replication algorithms. ABsTRACT resembles BFT-SMR and it can be used to make any shared service Byzantine fault-tolerant, with one exception: it may sometimes abort a client request. The non-triviality condition under which ABsTRACT cannot abort is a generic parameter. We view a BFT-SMR algorithm as a composition of instances of ABsTRACT, each instance developed and analyzed independently. To illustrate our approach, we describe two new optimally resilient BFT algorithms. The first, that makes use of our refined quorums, has the lowest time complexity among all BFT-SMR algorithms we know of, in synchronous periods that are free from contention and failures. The second algorithm has the highest peak throughput in failure-free and synchronous periods; this algorithm argues for general applicability of ABsTRACT in developing BFT shared services that feature optimal complexity, beyond the time complexity metric
    • …
    corecore