31 research outputs found

    Self-stabilization Overhead: an Experimental Case Study on Coded Atomic Storage

    Full text link
    Shared memory emulation can be used as a fault-tolerant and highly available distributed storage solution or as a low-level synchronization primitive. Attiya, Bar-Noy, and Dolev were the first to propose a single-writer, multi-reader linearizable register emulation where the register is replicated to all servers. Recently, Cadambe et al. proposed the Coded Atomic Storage (CAS) algorithm, which uses erasure coding for achieving data redundancy with much lower communication cost than previous algorithmic solutions. Although CAS can tolerate server crashes, it was not designed to recover from unexpected, transient faults, without the need of external (human) intervention. In this respect, Dolev, Petig, and Schiller have recently developed a self-stabilizing version of CAS, which we call CASSS. As one would expect, self-stabilization does not come as a free lunch; it introduces, mainly, communication overhead for detecting inconsistencies and stale information. So, one would wonder whether the overhead introduced by self-stabilization would nullify the gain of erasure coding. To answer this question, we have implemented and experimentally evaluated the CASSS algorithm on PlanetLab; a planetary scale distributed infrastructure. The evaluation shows that our implementation of CASSS scales very well in terms of the number of servers, the number of concurrent clients, as well as the size of the replicated object. More importantly, it shows (a) to have only a constant overhead compared to the traditional CAS algorithm (which we also implement) and (b) the recovery period (after the last occurrence of a transient fault) is as fast as a few client (read/write) operations. Our results suggest that CASSS does not significantly impact efficiency while dealing with automatic recovery from transient faults and bounded size of needed resources

    RamboNodes for the Metropolitan Ad Hoc Network

    Get PDF
    We present an algorithm to store data robustly in a large, geographically distributed network by means of localized regions of data storage that move in response to changing conditions. For example, data might migrate away from failures or toward regions of high demand. The PersistentNode algorithm provides this service robustly, but with limited safety guarantees. We use the RAMBO framework to transform PersistentNode into RamboNode, an algorithm that guarantees atomic consistency in exchange for increased cost and decreased liveness. In addition, a half-life analysis of RamboNode shows that it is robust against continuous low-rate failures. Finally, we provide experimental simulations for the algorithm on 2000 nodes, demonstrating how it services requests and examining how it responds to failures

    Space Complexity of Fault-Tolerant Register Emulations

    Get PDF
    Driven by the rising popularity of cloud storage, the costs associated with implementing reliable storage services from a collection of fault-prone servers have recently become an actively studied question. The well-known ABD result shows that an f-tolerant register can be emulated using a collection of 2f + 1 fault-prone servers each storing a single read-modify-write object type, which is known to be optimal. In this paper we generalize this bound: we investigate the inherent space complexity of emulating reliable multi-writer registers as a fucntion of the type of the base objects exposed by the underlying servers, the number of writers to the emulated register, the number of available servers, and the failure threshold. We establish a sharp separation between registers, and both max-registers (the base object types assumed by ABD) and CAS in terms of the resources (i.e., the number of base objects of the respective types) required to support the emulation; we show that no such separation exists between max-registers and CAS. Our main technical contribution is lower and upper bounds on the resources required in case the underlying base objects are fault-prone read/write registers. We show that the number of required registers is directly proportional to the number of writers and inversely proportional to the number of servers.Comment: Conference version appears in Proceedings of PODC '1

    SAM: Self* Atomic Memory for P2P Systems

    Get PDF
    We propose an implementation of self-adjusting and self-healing atomic memory in highly dynamic systems exploiting peer-to-peer (p2p) techniques. Our approach, named SAM, brings together new and old research areas such as p2p overlays, dynamic quorums and replica control. In SAM, nodes form a connected overlay. To emulate the behavior of an atomic memory we use intersected sets of nodes, namely quorums, where each node hosts a replica of an object. In our approach, a quorum set is obtained by performing a deterministic traversal of the overlay. The SAM overlay features self-* capabilities: that is, the overlay self-heals on the fly when nodes hosting replicas leave the system and the number of active replicas in the overlay dynamically self-adjusts function of the object load. In particular, SAM pushes requests from loaded replicas to less solicited replicas. If such replicas do not exist, the replicas overlay self-adjusts to absorb the extra load without breaking the atomicity. We propose a distributed implementation of SAM where nodes exploit only a restricted local view of the system, for the sake of scalability. We provide a complete specification of our system and prove that it implements object atomicity. / Ce rapport présente une mémoire atomique auto-ajustable et auto-réparante en systèmes hautement dynamiques. Nous proposons une implémentation de celle-ci basée sur l'utilisation de techniques égales-à-égales (p2p). Cette solution appelée SAM, rassemble des thématiques de recherches aussi bien anciennes que récentes telles que les couches de communication p2p, les quorums dynamiques et la réplication contrôlée. Les noeuds de SAM forment un sur-graphe connecté. Une copie de chaque objet est répliquée à différents noeuds, appelés réplicas. Afin d'assurer l'atomicité de ces objets, nous utilisons des quorums, ensembles intersectés de réplicas. Ces quorums sont obtenus via une traversée déterministe effectuée au sein du graphe de communication. De plus ce graphe possède des propriétés auto-* : Celui-ci s'auto-répare à la volée lorsque des réplicas quittent le système et le nombre de réplicas s'auto-ajuste en fonction de la charge. Plus particulièrement, SAM réparti automatiquement la charge en distribuant les requêtes effectuées sur des noeuds surchargés à d'autres noeuds. Si tous les noeuds sont surchargés alors le sur-graphe s'auto-ajuste pour absorber la charge induite en garantissant l'atomicité. Dans ce rapport nous proposons une implementation distribuée de SAM où chaque noeud ne possède qu'une connaissance locale du système, permettant ainsi son utilisation à grande échelle. Nous spécifions formellement cet algorithme et prouvons qu'il satisfait la propriété d'atomicité des objets

    On the robustness of (semi) fast quorum-based implementations of atomic shared memory

    Full text link

    On the Efficiency of Atomic Multi-reader, Multi-writer Distributed Memory

    Full text link
    This paper considers quorum-replicated, multi-writer, multi-reader (MWMR) implementations of surviv-able atomic registers in a distributed message-passing system with processors prone to failures. Previous implementations in such settings invariably required two rounds of communication between readers/writers and replica owners. Hence the question arises whether it is possible to have single round read and/or write operations in this setting. As a first step, we present an algorithm, called CWFR, that allows the classic two round write operations, while supporting single round read operations. Since multiple write operations may be concurrent with a read operation, this algorithm involves an iterative (local) discovery of the latest completed write operation. This algorithm precipitates the question of whether fast (single round) writes may co-exist with fast reads. We thus devise a second algorithm, called SFW, that exploits a new technique called server side ordering (SSO), which –unlike previous approaches – places partial responsibility for the ordering of write operations on the replica owners (the servers). With SSO, fast write operations are introduced for the very first time in the MWMR setting. While this is possible, we show that under certain conditions the MWMR model imposes in-herent limitations on any quorum-based fast write implementation of a safe read/write register and potentiall

    Fast Access to Distributed Atomic Memory

    Get PDF
    We study efficient and robust implementations of an atomic read-write data structure over an asynchronous distributed message-passing system made of reader and writer processes, as well as a number of servers implementing the data structure. We determine the exact conditions under which every read and write involves one round of communication with the servers. These conditions relate the number of readers to the tolerated number of faulty servers and the nature of these failures

    Robust services in dynamic systems

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2005.Includes bibliographical references (p. 191-202).Our growing reliance on online services accessible on the Internet demands highly- available systems that work correctly without interruption. This thesis extends previous work on Byzantine-fault-tolerant replication to meet the new requirements of current Internet services: scalability and the ability to reconfigure the service automatically in the presence of a changing system membership. Our solution addresses two important problems that appear in dynamic replicated services: First, we present a membership service that provides servers and clients in the system with a sequence of consistent views of the system membership (i.e., the set of currently available servers). The membership service is designed to be scalable, and to handle membership changes mostly automatically. Furthermore, the membership service is itself reconfigurable, and tolerates arbitrary faults of a subset of the servers that are implementing it at any instant. The second part of our solution is a generic methodology for transforming replicated services that assume a fixed membership into services that support a dynamic system membership. The methodology uses the output from the membership service to decide when to reconfigure.(cont.) We built two example services using this methodology: a dynamic Byzantine quorum system that supports read and write operations, and a dynamic Byzantine state machine replication system that supports any deterministic service. The final contribution of this thesis is an analytic study that points out an obstacle to the deployment of replicated services based on a dynamic membership. The basic problem is that maintaining redundancy levels for the service state as servers join and leave the system is costly in terms of network bandwidth. To evaluate how dynamic the system membership can be, we developed a model for the cost of state maintenance in dynamic replicated services, and we use measured values from real-world traces to determine possible values for the parameters of the model. We conclude that certain deployments (like a volunteer-based system) are incompatible with the goals of large- scale reliable services. We implemented the membership service and the two example services. Our performance results show that the membership service is scalable, and our replicated services perform well, even during reconfigurations.by Rodrigo Seromenho Miragaia Rodrigues.Ph.D
    corecore