Search CORE

45 research outputs found

Failure Detection and Consensus in the Crash-Recovery Model

Author: Aguilera Marcos Kawazoe
Chen Wei
Toueg Sam
Publication venue: 'SAGE Publications'
Publication date: 01/01/1998
Field of study

We study the problems of failure detection and consensus in asynchronous systems in which processes may crash and recover, and links may lose messages. We first propose new failure detectors that are particularly suitable to the crash-recovery model. We next determine under what conditions stable storage is necessary to solve consensus in this model. Using the new failure detectors, we give two consensus algorithms that match these conditions: one requires stable storage and the other does not. Both algorithms tolerate link failures and are particularly efficient in the runs that are most likely in practice --- those with no failures or failure detector mistakes. In such runs, consensus is achieved within 3d time and with 4n messages, where d is the maximum message delay and n is the number of processes in the system

CiteSeerX

eCommons@Cornell

Enhanced Failure Detection Mechanism in MapReduce

Author: Antoniu Gabriel
Memishi Bunjamin
Pérez Hernández María de los Santos
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2012
Field of study

The popularity of MapReduce programming model has increased interest in the research community for its improvement. Among the other directions, the point of fault tolerance, concretely the failure detection issue seems to be a crucial one, but that until now has not reached its satisfying level. Motivated by this, I decided to devote my main research during this period into having a prototype system architecture of MapReduce framework with a new failure detection service, containing both analytical (theoretical) and implementation part. I am confident that this work should lead the way for further contributions in detecting failures to any NoSQL App frameworks, and cloud storage systems in general

HAL-CentraleSupelec

Crossref

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

INRIA a CCSD electronic archive server

Hal-Diderot

Archivo Digital UPM

HAL-Rennes 1

Agreement in wider environments with weaker assumptions.

Author: Arévalo Viñuales Sergio
Jiménez Merino José Ernesto
Tang Jian
Publication venue: E.U. de Informática (UPM)
Publication date: 01/01/2012
Field of study

The set agreement problem states that from n proposed values at most n?1 can be decided. Traditionally, this problem is solved using a failure detector in asynchronous systems where processes may crash but do not recover, where processes have different identities, and where all processes initially know the membership. In this paper we study the set agreement problem and the weakest failure detector L used to solve it in asynchronous message passing systems where processes may crash and recover, with homonyms (i.e., processes may have equal identities) and without a complete initial knowledge of the membership

Archivo Digital UPM

Set agreement and the loneliness failure detector in crash-recovery systems

Author: Arévalo Viñuales Sergio
Jiménez Merino José Ernesto
Tang Jian
Publication venue: E.U. de Informática (UPM)
Publication date: 01/01/2013
Field of study

The set agreement problem states that from n proposed values at most n-1 can be decided. Traditionally, this problem is solved using a failure detector in asynchronous systems where processes may crash but not recover, where processes have different identities, and where all processes initially know the membership. In this paper we study the set agreement problem and the weakest failure detector L used to solve it in asynchronous message passing systems where processes may crash and recover, with homonyms (i.e., processes may have equal identities) and without a complete initial knowledge of the membership

Archivo Digital UPM

Easy Consensus Algorithms for the Crash-Recovery Model

Author: Freiling Felix
Lambertz Christian
Majster-Cederbaum Mila
Publication venue
Publication date: 01/01/2008
Field of study

In the crash-recovery failure model of asynchronous distributed systems, processes can temporarily stop to execute steps and later restart their computation from a predefined local state. The crash-recovery model is much more realistic than the crash-stop failure model in which processes merely are allowed to stop executing steps. The additional complexity is reflected in the multitude of assumptions and the technical complexity of algorithms which have been developed for that model. We focus on the problem of consensus in the crash-recovery model, but instead of developing completely new algorithms from scratch, our approach aims at reusing existing crash-stop consensus algorithms in a modular way using the abstraction of failure detectors. As a result, we present three new and relatively simple consensus algorithms for the crash-recovery model for different types of assumptions

CiteSeerX

Crossref

MAnnheim DOCument Server

Network Synchronization in the Crash-Recovery Model

Author: Freiling Felix
Henkel Sven
Widder Josef
Publication venue
Publication date: 01/01/2006
Field of study

This work investigates the amount of information about failures required to simulate a synchronous distributed system by an asynchronous distributed system prone to crash-recovery failures. A failure detection sequencer SigmaCR for the crash-recovery failure model is defined, which outputs information about crashes and recoveries and about the state of the crashed or recovered processes. Using the simulation technique of a synchronizer, it is shown that in general it is impossible to implement a synchronizer in an asynchronous distributed system with an arbitrary number of concurrent crash-recovery faults. It is shown that a synchronizer is implementable given SigmaCR and an asynchronous distributed system with at least one correct process. Furthermore, it is proven that SigmaCR can be emulated in a synchronous distributed system and hence can be regarded as the weakest failure detection device suitable to implement a synchronizer in the crash-recovery failure model

MAnnheim DOCument Server

Harmful dogmas in fault tolerant distributed computing

Author: Charron-Bost Bernadette
Schiper André
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 27/05/2008
Field of study

Infoscience - École polytechnique fédérale de Lausanne