Search CORE

2 research outputs found

Distributed checkpoint algorithms to avoid roll-back propagation

Author: Zambonelli F.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/1998
Field of study

Checkpointing is a very well known mechanism to achieve fault tolerance. In distributed applications, a local checkpoint is useful for fault tolerance purposes only if can belong to at least one consistent global checkpoint and then, execution can be restarted from it without needing to roll back the execution in the past. The paper introduces a theoretical framework that facilitates the definition and the analysis of distributed checkpoint algorithms to avoid roll backpropagation. On this base, several algorithms are presented and evaluated in a set of testbed applications

Archivio istituzionale della ricerca - Università di Modena e Reggio Emilia

Analysis and Evaluation of Distributed Checkpoint Algorithms to Avoid Roll-Back Propagation

Author: ZAMBONELLI Franco
Publication venue
Publication date: 01/01/1998
Field of study

Checkpointing is a very well known mechanism to achieve fault tolerance. In distributed applications where processes can checkpoint independently of each other, a local checkpoint is useful for fault tolerance purposes only if it belongs to at least one consistent global checkpoint. In this case, execution can be restarted from it without needing to rollback the execution in the past. The paper exploits a theoreticalframeworkthatfacilitatesthe definition and analysis of distributed checkpoint algorithms to avoid rollback propagation. Several distributed algorithms are presented which avoid roll-back propagation by forcing additional checkpoints in processes. The effectiveness of the algorithms is evaluated in several testbed applications, showing their limited capability of bounding the number of additional checkpoints

Archivio istituzionale della ricerca - Università di Modena e Reggio Emilia