Flexible and transparent fault tolerance for distributed object-oriented applications

Abstract

This report describes an approach enabling automatic structural reconfigurations of distributed applications based on configuration management in order to compensate for node and network failures. The major goal of the approach is to maintain the relevant application functionality after failures automatically.This goalis achieved by a dedicated system model and by a decentralized reconfiguration algorithm based on it. The system model provides support for redundant application object storage and for application-level consistency based on distributed checkpoints. The reconfiguration algorithm detects failures, computes a compensating configuration, and realizes this new configuration. The report emphasizes flexibility in the sense ofadaptable levels of fault tolerance, as well as transparency in the sense of fully-automatic reaction to failures

    Similar works