Search CORE

210 research outputs found

A suite of definitions for consistency criteria in distributed shared memories

Author: Raynal Michel
Schiper André
Publication venue
Publication date: 18/06/2018
Field of study

A shared memory built on top of a distributed system constitutes a distributed shared memory (DSM). If a lot of protocols implementing DSMS in various contexts have been proposed, no set of homogeneous definitions has been given for the many semantics offered by these implementations. This paper provides a suite of such definitions for atomic, sequential, causal, PRAM and a few others consistency criteria. These definitions are based on a unique framework : a parallel computation is defined as a partial order on the set of read and write operations invoked by processes, and a consistency criterion is defined as a constraint on this partial order. Such an approach provides a simple classification of consistency criteria, from the more to the less constrained one. This paper can also be considered as a survey on consistency criteria for DSM

RERO DOC Digital Library

Dependable Systems

Author: Schiper André
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 15/02/2012
Field of study

Improving the dependability of computer systems is a critical and essential task. In this context, the paper surveys techniques that allow to achieve fault tolerance in distributed systems by replication. The main replication techniques are first explained. Then group communication is introduced as the communication infrastructure that allows the implementation of the different replication techniques. Finally the difficulty of implementing group communication is discussed, and the most important algorithms are presented

Infoscience - École polytechnique fédérale de Lausanne

Replication for send-deterministic MPI HPC applications

Author: Lefray Arnaud
Ropars Thomas
Schiper André
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2013
Field of study

International audienceReplication has recently gained attention in the context of fault tolerance for large scale MPI HPC applications. Existing implementations try to cover all MPI codes and to be independent from the underlying library. In this paper, we evaluate the advantages of adopting a different approach. First, we try to take advantage of a communication property common to many MPI HPC application, namely send-determinism. Second, we choose to implement replication inside the MPI library. The main advantage of our approach is simplicity. While being only a small patch to the Open MPI library, our solution called SDR-MPI supports most main features of the MPI standard including all collectives and group operations. SDR-MPI additionally achieves good performance: Experiments run with HPC benchmarks and applications show that its overhead remains below 5%

HAL-ENS-LYON

Infoscience - École polytechnique fédérale de Lausanne

Crossref

INRIA a CCSD electronic archive server

Hal-Diderot

Group Communication: From Practice to Theory

Author: Schiper André
Publication venue
Publication date: 26/05/2008
Field of study

Infoscience - École polytechnique fédérale de Lausanne

Model Checking of Consensus Algorithms

Author: Schiper André
Tsuchiya Tatsuhiro
Publication venue
Publication date: 05/12/2006
Field of study

We show for the first time that standard model checking allows one to completely verify asynchronous algorithms for solving consensus, a fundamental problem in fault-tolerant distributed computing. Model checking is a powerful verification methodology based on state exploration. However it has rarely been applied to consensus algorithms, because these algorithms induce huge, often infinite state spaces. Here we focus on consensus algorithms based on the Heard-Of model (HO model, for short), a new computation model for distributed computing. By making use of the high abstraction level provided by this computation model, we develop a methodology for verifying consensus algorithms in every possible state by model checking. This paper describes the proposed verification methodology and the results of applying it to various consensus algorithms

Infoscience - École polytechnique fédérale de Lausanne

Revisiting Token-based Atomic Broadcast Algorithms

Author: Ekwall Richard
Schiper André
Publication venue
Publication date: 13/07/2005
Field of study

Many atomic broadcast algorithms have been published in the last twenty years. The two main mechanisms used to tolerate failures (if we exclude synchronous systems and consider only crash failures) are unreliable failure detectors and group membership. Token-based atomic broadcast algorithms represent a large class of atomic broadcast algorithms. Interestingly all the token-based algorithms rely on group membership. The paper presents a token-based atomic broadcast algorithm that uses a failure detector, namely the new failure detector denoted by R. The failure detector R is compared with P and S. Solving consensus with token-based algorithms using R is also discussed

Infoscience - École polytechnique fédérale de Lausanne

Modeling and validating the performance of atomic broadcast algorithms in high-latency networks

Author: Ekwall Richard
Schiper André
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2007
Field of study

The performance of consensus and atomic broadcast algorithms using failure detectors is often affected by a trade-off between the number of communication steps and the number of messages needed to reach a decision. In this paper, we model the performance of three consensus and atomic broadcast algorithms using failure detectors in the oft-neglected setting of wide area networks and validate this model by experimentally evaluating the algorithms in several different setups

Infoscience - École polytechnique fédérale de Lausanne

CiteSeerX