Search CORE

4,547 research outputs found

Uncovering Bugs in Distributed Storage Systems during Testing (not in Production!)

Author: Chen S
Deligiannis P
Donaldson AF
Erickson J
Huang C
Lal A
McCutchen M
Mudduluru R
Qadeer S
Schulte W
Thomson P
Publication venue: USENIX
Publication date: 07/12/2015
Field of study

Testing distributed systems is challenging due to multiple sources of nondeterminism. Conventional testing techniques, such as unit, integration and stress testing, are ineffective in preventing serious but subtle bugs from reaching production. Formal techniques, such as TLA+, can only verify high-level specifications of systems at the level of logic-based models, and fall short of checking the actual executable code. In this paper, we present a new methodology for testing distributed systems. Our approach applies advanced systematic testing techniques to thoroughly check that the executable code adheres to its high-level specifications, which significantly improves coverage of important system behaviors. Our methodology has been applied to three distributed storage systems in the Microsoft Azure cloud computing platform. In the process, numerous bugs were identified, reproduced, confirmed and fixed. These bugs required a subtle combination of concurrency and failures, making them extremely difficult to find with conventional testing techniques. An important advantage of our approach is that a bug is uncovered in a small setting and witnessed by a full system trace, which dramatically increases the productivity of debugging

Spiral - Imperial College Digital Repository

A framework for proving the self-organization of dynamic systems

Author: Anceaume Emmanuelle
Défago Xavier
Potop-Butucaru Maria
Roy Matthieu
Publication venue
Publication date: 09/11/2010
Field of study

This paper aims at providing a rigorous definition of self- organization, one of the most desired properties for dynamic systems (e.g., peer-to-peer systems, sensor networks, cooperative robotics, or ad-hoc networks). We characterize different classes of self-organization through liveness and safety properties that both capture information re- garding the system entropy. We illustrate these classes through study cases. The first ones are two representative P2P overlays (CAN and Pas- try) and the others are specific implementations of \Omega (the leader oracle) and one-shot query abstractions for dynamic settings. Our study aims at understanding the limits and respective power of existing self-organized protocols and lays the basis of designing robust algorithm for dynamic systems

arXiv.org e-Print Archive

HAL-CentraleSupelec

Scientific Publications of the University of Toulouse II Le Mirail

INRIA a CCSD electronic archive server

A Protocol for the Atomic Capture of Multiple Molecules at Large Scale

Author: Bertier Marin
Obrovac Marko
Tedeschi Cédric
Publication venue
Publication date: 03/01/2012
Field of study

With the rise of service-oriented computing, applications are more and more based on coordination of autonomous services. Envisioned over largely distributed and highly dynamic platforms, expressing this coordination calls for alternative programming models. The chemical programming paradigm, which models applications as chemical solutions where molecules representing digital entities involved in the computation, react together to produce a result, has been recently shown to provide the needed abstractions for autonomic coordination of services. However, the execution of such programs over large scale platforms raises several problems hindering this paradigm to be actually leveraged. Among them, the atomic capture of molecules participating in concur- rent reactions is one of the most significant. In this paper, we propose a protocol for the atomic capture of these molecules distributed and evolving over a large scale platform. As the density of possible reactions is crucial for the liveness and efficiency of such a capture, the protocol proposed is made up of two sub-protocols, each of them aimed at addressing different levels of densities of potential reactions in the solution. While the decision to choose one or the other is local to each node participating in a program's execution, a global coherent behaviour is obtained. Proof of liveness, as well as intensive simulation results showing the efficiency and limited overhead of the protocol are given.Comment: 13th International Conference on Distributed Computing and Networking (2012

arXiv.org e-Print Archive

HAL-CentraleSupelec

INRIA a CCSD electronic archive server

HAL-Rennes 1

ARES: Adaptive, Reconfigurable, Erasure coded, atomic Storage

Author: Cadambe Viveck
Konwar Kishori M.
Lynch Nancy
Medard Muriel
Nicolaou Nicolas
Prakash N.
Publication venue
Publication date: 09/05/2018
Field of study

Atomicity or strong consistency is one of the fundamental, most intuitive, and hardest to provide primitives in distributed shared memory emulations. To ensure survivability, scalability, and availability of a storage service in the presence of failures, traditional approaches for atomic memory emulation, in message passing environments, replicate the objects across multiple servers. Compared to replication based algorithms, erasure code-based atomic memory algorithms has much lower storage and communication costs, but usually, they are harder to design. The difficulty of designing atomic memory algorithms further grows, when the set of servers may be changed to ensure survivability of the service over software and hardware upgrades, while avoiding service interruptions. Atomic memory algorithms for performing server reconfiguration, in the replicated systems, are very few, complex, and are still part of an active area of research; reconfigurations of erasure-code based algorithms are non-existent. In this work, we present ARES, an algorithmic framework that allows reconfiguration of the underlying servers, and is particularly suitable for erasure-code based algorithms emulating atomic objects. ARES introduces new configurations while keeping the service available. To use with ARES we also propose a new, and to our knowledge, the first two-round erasure code based algorithm TREAS, for emulating multi-writer, multi-reader (MWMR) atomic objects in asynchronous, message-passing environments, with near-optimal communication and storage costs. Our algorithms can tolerate crash failures of any client and some fraction of servers, and yet, guarantee safety and liveness property. Moreover, by bringing together the advantages of ARES and TREAS, we propose an optimized algorithm where new configurations can be installed without the objects values passing through the reconfiguration clients

arXiv.org e-Print Archive

DSpace@MIT

S+Net: extending functional coordination with extra-functional semantics

Author: Grelck Clemens
Kirner Raimund
Penczek Frank
Poss Raphael
Shafarenko Alex
Verstraaten Merijn
Publication venue
Publication date: 01/01/2013
Field of study

This technical report introduces S+Net, a compositional coordination language for streaming networks with extra-functional semantics. Compositionality simplifies the specification of complex parallel and distributed applications; extra-functional semantics allow the application designer to reason about and control resource usage, performance and fault handling. The key feature of S+Net is that functional and extra-functional semantics are defined orthogonally from each other. S+Net can be seen as a simultaneous simplification and extension of the existing coordination language S-Net, that gives control of extra-functional behavior to the S-Net programmer. S+Net can also be seen as a transitional research step between S-Net and AstraKahn, another coordination language currently being designed at the University of Hertfordshire. In contrast with AstraKahn which constitutes a re-design from the ground up, S+Net preserves the basic operational semantics of S-Net and thus provides an incremental introduction of extra-functional control in an existing language.Comment: 34 pages, 11 figures, 3 table

arXiv.org e-Print Archive

CiteSeerX

UvA-DARE

International Migration, Integration and Social Cohesion online publications

Reconfigurable Lattice Agreement and Applications

Author: Kuznetsov Petr
Rieutord Thibault
Tucci-Piergiovanni Sara
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 23rd International Conference on Principles of Distributed Systems (OPODIS 2019)
Publication date: 21/10/2019
Field of study

Reconfiguration is one of the central mechanisms in distributed systems. Due to failures and connectivity disruptions, the very set of service replicas (or servers) and their roles in the computation may have to be reconfigured over time. To provide the desired level of consistency and availability to applications running on top of these servers, the clients of the service should be able to reach some form of agreement on the system configuration. We observe that this agreement is naturally captured via a lattice partial order on the system states. We propose an asynchronous implementation of reconfigurable lattice agreement that implies elegant reconfigurable versions of a large class of lattice abstract data types, such as max-registers and conflict detectors, as well as popular distributed programming abstractions, such as atomic snapshot and commit-adopt

arXiv.org e-Print Archive

HAL Descartes

Dagstuhl Research Online Publication Server

HAL-CEA