DDG Task Recovery for Cluster Computing ⋆

G. T. Nguyen; L. Hluchy; M. Kotocova; V. D. Tran

DDG Task Recovery for Cluster Computing ⋆

Authors: G. T. Nguyen
L. Hluchy
M. Kotocova
V. D. Tran
Publication date
Publisher

Abstract

Abstract. This paper presents a solution for the problem of transparent recovery of asynchronous distributed computation on clusters of workstations when a fault occurs on a node. If the system has fault-tolerant features, it can survive the fault and continues its computations. Performance degradation is unavoidable when hardware redundancies are not available. It is a large advantage if the long-runtime application can restart from a checkpoint instead of restarting whole computation. This paper presents the fault-tolerant feature of the DDG environment oriented to cluster systems without hardware spare. 1

Similar works

Full text

Available Versions

CiteSeerX

oai:CiteSeerX.psu:10.1.1.99.72...

Last time updated on 23/10/2014