3,685 research outputs found

    Reconfigurable Lattice Agreement and Applications

    Get PDF
    Reconfiguration is one of the central mechanisms in distributed systems. Due to failures and connectivity disruptions, the very set of service replicas (or servers) and their roles in the computation may have to be reconfigured over time. To provide the desired level of consistency and availability to applications running on top of these servers, the clients of the service should be able to reach some form of agreement on the system configuration. We observe that this agreement is naturally captured via a lattice partial order on the system states. We propose an asynchronous implementation of reconfigurable lattice agreement that implies elegant reconfigurable versions of a large class of lattice abstract data types, such as max-registers and conflict detectors, as well as popular distributed programming abstractions, such as atomic snapshot and commit-adopt

    ARES: Adaptive, Reconfigurable, Erasure coded, atomic Storage

    Full text link
    Atomicity or strong consistency is one of the fundamental, most intuitive, and hardest to provide primitives in distributed shared memory emulations. To ensure survivability, scalability, and availability of a storage service in the presence of failures, traditional approaches for atomic memory emulation, in message passing environments, replicate the objects across multiple servers. Compared to replication based algorithms, erasure code-based atomic memory algorithms has much lower storage and communication costs, but usually, they are harder to design. The difficulty of designing atomic memory algorithms further grows, when the set of servers may be changed to ensure survivability of the service over software and hardware upgrades, while avoiding service interruptions. Atomic memory algorithms for performing server reconfiguration, in the replicated systems, are very few, complex, and are still part of an active area of research; reconfigurations of erasure-code based algorithms are non-existent. In this work, we present ARES, an algorithmic framework that allows reconfiguration of the underlying servers, and is particularly suitable for erasure-code based algorithms emulating atomic objects. ARES introduces new configurations while keeping the service available. To use with ARES we also propose a new, and to our knowledge, the first two-round erasure code based algorithm TREAS, for emulating multi-writer, multi-reader (MWMR) atomic objects in asynchronous, message-passing environments, with near-optimal communication and storage costs. Our algorithms can tolerate crash failures of any client and some fraction of servers, and yet, guarantee safety and liveness property. Moreover, by bringing together the advantages of ARES and TREAS, we propose an optimized algorithm where new configurations can be installed without the objects values passing through the reconfiguration clients

    DIVERSE: a Software Toolkit to Integrate Distributed Simulations with Heterogeneous Virtual Environments

    Get PDF
    We present DIVERSE (Device Independent Virtual Environments- Reconfigurable, Scalable, Extensible), which is a modular collection of complimentary software packages that we have developed to facilitate the creation of distributed operator-in-the-loop simulations. In DIVERSE we introduce a novel implementation of remote shared memory (distributed shared memory) that uses Internet Protocol (IP) networks. We also introduce a new method that automatically extends hardware drivers (not in the operating system kernel driver sense) into inter-process and Internet hardware services. Using DIVERSE, a program can display in a CAVEâ„¢, ImmersaDeskâ„¢, head mounted display (HMD), desktop or laptop without modification. We have developed a method of configuring user programs at run-time by loading dynamic shared objects (DSOs), in contrast to the more common practice of creating interpreted configuration languages. We find that by loading DSOs the development time, complexity and size of DIVERSE and DIVERSE user applications is significantly reduced. Configurations to support different I/O devices, device emulators, visual displays, and any component of a user application including interaction techniques, can be changed at run-time by loading different sets of DIVERSE DSOs. In addition, interpreted run-time configuration parsers have been implemented using DIVERSE DSOs; new ones can be created as needed. DIVERSE is free software, licensed under the terms of the GNU General Public License (GPL) and the GNU Lesser General Public License (LGPL) licenses. We describe the DIVERSE architecture and demonstrate how DIVERSE was used in the development of a specific application, an operator-in-the-loop Navy ship-board crane simulator, which runs unmodified on a desktop computer and/or in a CAVE with motion base motion queuing

    Fragmented ARES: Dynamic Storage for Large Objects

    Get PDF
    Data availability is one of the most important features in distributed storage systems, made possible by data replication. Nowadays data are generated rapidly and developing efficient, scalable and reliable storage systems has become one of the major challenges for high performance computing. In this work, we develop and prove correct a dynamic, robust and strongly consistent distributed shared memory suitable for handling large objects (such as files) and utilizing erasure coding. We do so by integrating an Adaptive, Reconfigurable, Atomic memory framework, called Ares, with the CoBFS framework, which relies on a block fragmentation technique to handle large objects. With the addition of Ares, we also enable the use of an erasure-coded algorithm to further split the data and to potentially improve storage efficiency at the replica servers and operation latency. Our development is complemented with an in-depth experimental evaluation on the Emulab and AWS EC2 testbeds, illustrating the benefits of our approach, as well as interesting tradeoffs

    Asynchronous Reconfiguration with Byzantine Failures

    Get PDF
    Replicated services are inherently vulnerable to failures and security breaches. In a long-running system, it is, therefore, indispensable to maintain a reconfiguration mechanism that would replace faulty replicas with correct ones. An important challenge is to enable reconfiguration without affecting the availability and consistency of the replicated data: the clients should be able to get correct service even when the set of service replicas is being updated. In this paper, we address the problem of reconfiguration in the presence of Byzantine failures: faulty replicas or clients may arbitrarily deviate from their expected behavior. We describe a generic technique for building asynchronous and Byzantine fault-tolerant reconfigurable objects: clients can manipulate the object data and issue reconfiguration calls without reaching consensus on the current configuration. With the help of forward-secure digital signatures, our solution makes sure that superseded and possibly compromised configurations are harmless, that slow clients cannot be fooled into reading stale data, and that Byzantine clients cannot cause a denial of service by flooding the system with reconfiguration requests. Our approach is modular and based on dynamic lattice agreement abstraction, and we discuss how to extend it to enable Byzantine fault-tolerant implementations of a large class of reconfigurable replicated services

    Fast Lean Erasure-Coded Atomic Memory Object

    Get PDF
    In this work, we propose FLECKS, an algorithm which implements atomic memory objects in a multi-writer multi-reader (MWMR) setting in asynchronous networks and server failures. FLECKS substantially reduces storage and communication costs over its replication-based counterparts by employing erasure-codes. FLECKS outperforms the previously proposed algorithms in terms of the metrics that to deliver good performance such as storage cost per object, communication cost a high fault-tolerance of clients and servers, guaranteed liveness of operation, and a given number of communication rounds per operation, etc. We provide proofs for liveness and atomicity properties of FLECKS and derive worst-case latency bounds for the operations. We implemented and deployed FLECKS in cloud-based clusters and demonstrate that FLECKS has substantially lower storage and bandwidth costs, and significantly lower latency of operations than the replication-based mechanisms

    CATS: linearizability and partition tolerance in scalable and self-organizing key-value stores

    Get PDF
    Distributed key-value stores provide scalable, fault-tolerant, and self-organizing storage services, but fall short of guaranteeing linearizable consistency in partially synchronous, lossy, partitionable, and dynamic networks, when data is distributed and replicated automatically by the principle of consistent hashing. This paper introduces consistent quorums as a solution for achieving atomic consistency. We present the design and implementation of CATS, a distributed key-value store which uses consistent quorums to guarantee linearizability and partition tolerance in such adverse and dynamic network conditions. CATS is scalable, elastic, and self-organizing; key properties for modern cloud storage middleware. Our system shows that consistency can be achieved with practical performance and modest throughput overhead (5%) for read-intensive workloads

    Kompics: a message-passing component model for building distributed systems

    Get PDF
    The Kompics component model and programming framework was designedto simplify the development of increasingly complex distributed systems. Systems built with Kompics leverage multi-core machines out of the box and they can be dynamically reconfigured to support hot software upgrades. A simulation framework enables deterministic debugging and reproducible performance evaluation of unmodified Kompics distributed systems. We describe the component model and show how to program and compose event-based distributed systems. We present the architectural patterns and abstractions that Kompics facilitates and we highlight a case study of a complex distributed middleware that we have built with Kompics. We show how our approach enables systematic development and evaluation of large-scale and dynamic distributed systems
    • …
    corecore