6,691 research outputs found

    Stabilizing Server-Based Storage in Byzantine Asynchronous Message-Passing Systems

    Full text link
    A stabilizing Byzantine single-writer single-reader (SWSR) regular register, which stabilizes after the first invoked write operation, is first presented. Then, new/old ordering inversions are eliminated by the use of a (bounded) sequence number for writes, obtaining a practically stabilizing SWSR atomic register. A practically stabilizing Byzantine single-writer multi-reader (SWMR) atomic register is then obtained by using several copies of SWSR atomic registers. Finally, bounded time-stamps, with a time-stamp per writer, together with SWMR atomic registers, are used to construct a practically stabilizing Byzantine multi-writer multi-reader (MWMR) atomic register. In a system of nn servers implementing an atomic register, and in addition to transient failures, the constructions tolerate t<n/8 Byzantine servers if communication is asynchronous, and t<n/3 Byzantine servers if it is synchronous. The noteworthy feature of the proposed algorithms is that (to our knowledge) these are the first that build an atomic read/write storage on top of asynchronous servers prone to transient failures, and where up to t of them can be Byzantine

    Tight Mobile Byzantine Tolerant Atomic Storage

    Full text link
    This paper proposes the first implementation of an atomic storage tolerant to mobile Byzantine agents. Our implementation is designed for the round-based synchronous model where the set of Byzantine nodes changes from round to round. In this model we explore the feasibility of multi-writer multi-reader atomic register prone to various mobile Byzantine behaviors. We prove upper and lower bounds for solving the atomic storage in all the explored models. Our results, significantly different from the static case, advocate for a deeper study of the main building blocks of distributed computing while the system is prone to mobile Byzantine failures

    Self-stabilizing virtual synchrony

    Get PDF
    Virtual synchrony (VS) is an important abstraction that is proven to be extremely useful when implemented over asynchronous, typically large, message-passing distributed systems. Fault tolerant design is critical for the success of such implementations since large distributed systems can be highly available as long as they do not depend on the full operational status of every system participant. Self-stabilizing systems can tolerate transient faults that drive the system to an arbitrary unpredictable configuration. Such systems automatically regain consistency from any such configuration, and then produce the desired system behavior ensuring it for practically infinite number of successive steps, e.g., 264 steps. We present a new multi-purpose self-stabilizing counter algorithm establishing an efficient practically unbounded counter, that can directly yield a self-stabilizing Multiple-Writer Multiple-Reader (MWMR) register emulation. We use our counter algorithm, together with a selfstabilizing group membership and a self-stabilizing multicast service to devise the first practically stabilizing VS algorithm and a self-stabilizing VS-based emulation of state machine replication (SMR). As we base the SMR implementation on VS, rather than consensus, the system progresses in more extreme asynchronous settings in relation to consensusbased SMR

    Self-Stabilizing and Private Distributed Shared Atomic Memory in Seldomly Fair Message Passing Networks

    Get PDF
    We study the problem of privately emulating shared memory in message-passing networks. The system includes clients that store and retrieve replicated information on N servers, out of which e are data-corrupting malicious. When a client accesses a data-corrupting malicious server, the data field of that server response might be different from the value it originally stored. However, all other control variables in the server reply and protocol actions are according to the server algorithm. For the coded atomic storage algorithms by Cadambe et al., we present an enhancement that ensures no information leakage and data-corrupting malicious fault-tolerance. We also consider recovery after the occurrence of transient faults that violate the assumptions according to which the system was designed to operate. After their last occurrence, transient faults leave the system in an arbitrary state (while the program code stays intact). We present a self-stabilizing algorithm, which recovers after the occurrence of transient faults. This addition to Cadambe et al. considers asynchronous settings as long as no transient faults occur. The recovery from transient faults that bring the system counters (close) to their maximal values may include the use of a global reset procedure, which requires the system run to be controlled by a fair scheduler. After the recovery period, the safety properties are provided for asynchronous system runs that are not necessarily controlled by fair schedulers. Since the recovery period is bounded and the occurrence of transient faults is extremely rare, we call this design criteria self-stabilization in the presence of seldom fairness. Our self-stabilizing algorithm uses a bounded amount of storage during asynchronous executions (that are not necessarily controlled by fair schedulers). To the best of our knowledge, we are the first to address privacy, data-corrupting malicious behavior, and self-stabilization in the context of emulating atomic shared memory in message-passing systems

    Self-stabilization Overhead: an Experimental Case Study on Coded Atomic Storage

    Full text link
    Shared memory emulation can be used as a fault-tolerant and highly available distributed storage solution or as a low-level synchronization primitive. Attiya, Bar-Noy, and Dolev were the first to propose a single-writer, multi-reader linearizable register emulation where the register is replicated to all servers. Recently, Cadambe et al. proposed the Coded Atomic Storage (CAS) algorithm, which uses erasure coding for achieving data redundancy with much lower communication cost than previous algorithmic solutions. Although CAS can tolerate server crashes, it was not designed to recover from unexpected, transient faults, without the need of external (human) intervention. In this respect, Dolev, Petig, and Schiller have recently developed a self-stabilizing version of CAS, which we call CASSS. As one would expect, self-stabilization does not come as a free lunch; it introduces, mainly, communication overhead for detecting inconsistencies and stale information. So, one would wonder whether the overhead introduced by self-stabilization would nullify the gain of erasure coding. To answer this question, we have implemented and experimentally evaluated the CASSS algorithm on PlanetLab; a planetary scale distributed infrastructure. The evaluation shows that our implementation of CASSS scales very well in terms of the number of servers, the number of concurrent clients, as well as the size of the replicated object. More importantly, it shows (a) to have only a constant overhead compared to the traditional CAS algorithm (which we also implement) and (b) the recovery period (after the last occurrence of a transient fault) is as fast as a few client (read/write) operations. Our results suggest that CASSS does not significantly impact efficiency while dealing with automatic recovery from transient faults and bounded size of needed resources
    • …
    corecore