210 research outputs found

    A suite of definitions for consistency criteria in distributed shared memories

    Get PDF
    A shared memory built on top of a distributed system constitutes a distributed shared memory (DSM). If a lot of protocols implementing DSMS in various contexts have been proposed, no set of homogeneous definitions has been given for the many semantics offered by these implementations. This paper provides a suite of such definitions for atomic, sequential, causal, PRAM and a few others consistency criteria. These definitions are based on a unique framework : a parallel computation is defined as a partial order on the set of read and write operations invoked by processes, and a consistency criterion is defined as a constraint on this partial order. Such an approach provides a simple classification of consistency criteria, from the more to the less constrained one. This paper can also be considered as a survey on consistency criteria for DSM

    Dependable Systems

    Get PDF
    Improving the dependability of computer systems is a critical and essential task. In this context, the paper surveys techniques that allow to achieve fault tolerance in distributed systems by replication. The main replication techniques are first explained. Then group communication is introduced as the communication infrastructure that allows the implementation of the different replication techniques. Finally the difficulty of implementing group communication is discussed, and the most important algorithms are presented

    Replication for send-deterministic MPI HPC applications

    Get PDF
    International audienceReplication has recently gained attention in the context of fault tolerance for large scale MPI HPC applications. Existing implementations try to cover all MPI codes and to be independent from the underlying library. In this paper, we evaluate the advantages of adopting a different approach. First, we try to take advantage of a communication property common to many MPI HPC application, namely send-determinism. Second, we choose to implement replication inside the MPI library. The main advantage of our approach is simplicity. While being only a small patch to the Open MPI library, our solution called SDR-MPI supports most main features of the MPI standard including all collectives and group operations. SDR-MPI additionally achieves good performance: Experiments run with HPC benchmarks and applications show that its overhead remains below 5%

    Model Checking of Consensus Algorithms

    Get PDF
    We show for the first time that standard model checking allows one to completely verify asynchronous algorithms for solving consensus, a fundamental problem in fault-tolerant distributed computing. Model checking is a powerful verification methodology based on state exploration. However it has rarely been applied to consensus algorithms, because these algorithms induce huge, often infinite state spaces. Here we focus on consensus algorithms based on the Heard-Of model (HO model, for short), a new computation model for distributed computing. By making use of the high abstraction level provided by this computation model, we develop a methodology for verifying consensus algorithms in every possible state by model checking. This paper describes the proposed verification methodology and the results of applying it to various consensus algorithms

    Revisiting Token-based Atomic Broadcast Algorithms

    Get PDF
    Many atomic broadcast algorithms have been published in the last twenty years. The two main mechanisms used to tolerate failures (if we exclude synchronous systems and consider only crash failures) are unreliable failure detectors and group membership. Token-based atomic broadcast algorithms represent a large class of atomic broadcast algorithms. Interestingly all the token-based algorithms rely on group membership. The paper presents a token-based atomic broadcast algorithm that uses a failure detector, namely the new failure detector denoted by R. The failure detector R is compared with P and S. Solving consensus with token-based algorithms using R is also discussed

    Modeling and validating the performance of atomic broadcast algorithms in high-latency networks

    Get PDF
    The performance of consensus and atomic broadcast algorithms using failure detectors is often affected by a trade-off between the number of communication steps and the number of messages needed to reach a decision. In this paper, we model the performance of three consensus and atomic broadcast algorithms using failure detectors in the oft-neglected setting of wide area networks and validate this model by experimentally evaluating the algorithms in several different setups
    • …
    corecore