We suggest to model software package flaws (bugs) by assuming eventual Byzantine behavior of the package. In particular, the package has been tested by the manufacturer for limited length scenarios when started in a predefined initial state; the behavior beyond the tested scenario may be Byzantine. Restarts (reboots) are useful for recovering such systems. We suggest a general yet practical framework and paradigm, based on a theoretical foundation, for the monitoring and restarting of systems. An autonomic recoverer that monitors and restarts the system is proposed, where: The autonomic recoverer is designed to handle different tasks given specific task requirements in the form of predicates and actions. DAG subsystem hierarchy structure is used by a consistency monitoring procedure in order to achieve gracious recovery. The existence and correct functionality of the autonomic recovery is guaranteed by the use of a kernel resident (anchor) process, and the design of the process to be self-stabilizing. The autonomic recoverer uses the new scheme for liveness assurance via on-line monitoring that complements known schemes for on-line ensuring safety
To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.