Skip to main content
Article thumbnail
Location of Repository

A Non-blocking Recovery Algorithm for Causal Message Logging

By J. Roger, Mitchell Vijay and K. Garg

Abstract

In the recovery of failed processes in a distributed program, causal logging schemes offer several benefits. These benefits include no rollback of unfailed processes and simple approaches to output commit. Unfortunately, previous approaches to the recovery of multiple simultaneous failures require that the distributed execution be blocked or that recovering processes coordinate. The latter requires assumptions which are not satisfactory. In this paper we present a solution that has neither of these drawbacks. Message logging is an important technique for recovering from failures in distributed programs. This technique logs the order in which messages are received. By assuming that receive ordering is the only source of non-determinism, execution is recoverable using this ordering. Pessimistic message logging [4, 11] forces a process to wait before sending any message while the message log is written to stable storage. Optimistic logging methods [9, 12, 13, 15] (and the similar sender based logging [8, 14]) assume failures are rare and therefore allow ordering information to be lost in a failure. (That is, a message is logged in the background while execution proceeds). Consequently, received messages and any sends that depend on them may not be recoverable. This may then require that unfailed processes roll back their execution as well. Causal message logging sends message receive ordering information with each message. This information includes receives and their causal history since the last send. The Manetho approach [6] uses this method. In family-based message logging (FBL) [2] causal history information for only K processes is included. This method then tolerates K simultaneous failures rather than all processes in the system (as with Manetho and the other logging methods.) The causal message logging approach offers advantage

Year: 2009
OAI identifier: oai:CiteSeerX.psu:10.1.1.135.3247
Provided by: CiteSeerX
Download PDF:
Sorry, we are unable to provide the full text but you may find it at the following location(s):
  • http://citeseerx.ist.psu.edu/v... (external link)
  • http://users.ece.utexas.edu/~g... (external link)
  • Suggested articles


    To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.