Innovative Idea Category Software Probes and a Self-testing System- for Failure Detection and Diagnosis 1

Abstract

Akey problem in todays complex software systems is software failure detection and isolation. Given that most software failures are only partial and if e ciently diagnosed, isolated and recovered, they could avert a total outage. The probe detects failed software components in a running software system by requesting service, or a certain level of service, from a set of functions, modules and/or subsystems (target) and checking the response to the request. The objective is to localize the failure only up to the level of a target, however, achieve a high degree of e ciency and con dence in the process. Targets can be identi ed at di erent levels or layers in the software, the choice based on the granularity of fault detection that is desired, taken in consideration with the level at which recovery can be implemented. The implementation of the probe system is made self testing against any single failure in its operational components, using the idea of a null probe. The probe system has been designed taking advantage of the latency characteristics of errors to provide a low-overhead mechanism. The ideas are implementable in either a single or multiple computer system

    Similar works

    Full text

    thumbnail-image

    Available Versions