3 research outputs found
Implementing atomic actions in Ada 95
Atomic actions are an important dynamic structuring technique that aid the construction of fault-tolerant concurrent systems. Although they were developed some years ago, none of the well-known commercially-available programming languages directly support their use. This paper summarizes software fault tolerance techniques for concurrent systems, evaluates the Ada 95 programming language from the perspective of its support for software fault tolerance, and shows how Ada 95 can be used to implement software fault tolerance techniques. In particular, it shows how packages, protected objects, requeue, exceptions, asynchronous transfer of control, tagged types, and controlled types can be used as building blocks from which to construct atomic actions with forward and backward error recovery, which are resilient to deserter tasks and task abortion
The planning coordinator: A design architecture for autonomous error recovery and on-line planning of intelligent tasks
Developing a robust, task level, error recovery and on-line planning architecture is an open research area. There is previously published work on both error recovery and on-line planning; however, none incorporates error recovery and on-line planning into one integrated platform. The integration of these two functionalities requires an architecture that possesses the following characteristics. The architecture must provide for the inclusion of new information without the destruction of existing information. The architecture must provide for the relating of pieces of information, old and new, to one another in a non-trivial rather than trivial manner (e.g., object one is related to object two under the following constraints, versus, yes, they are related; no, they are not related). Finally, the architecture must be not only a stand alone architecture, but also one that can be easily integrated as a supplement to some existing architecture. This thesis proposal addresses architectural development. Its intent is to integrate error recovery and on-line planning onto a single, integrated, multi-processor platform. This intelligent x-autonomous platform, called the Planning Coordinator, will be used initially to supplement existing x-autonomous systems and eventually replace them
Fault-tolerant software: dependability/performance trade-offs, concurrency and system support
PhD ThesisAs the use of computer systems becomes more and more widespread in applications
that demand high levels of dependability, these applications themselves are growing in
complexity in a rapid rate, especially in the areas that require concurrent and distributed
computing. Such complex systems are very prone to faults and errors. No matter how
rigorously fault avoidance and fault removal techniques are applied, software design
faults often remain in systems when they are delivered to the customers. In fact,
residual software faults are becoming the significant underlying cause of system
failures and the lack of dependability. There is tremendous need for systematic
techniques for building dependable software, including the fault tolerance techniques
that ensure software-based systems to operate dependably even when potential faults
are present. However, although there has been a large amount of research in the area of
fault-tolerant software, existing techniques are not yet sufficiently mature as a practical
engineering discipline for realistic applications. In particular, they are often inadequate
when applied to highly concurrent and distributed software.
This thesis develops new techniques for building fault-tolerant software, addresses the
problem of achieving high levels of dependability in concurrent and distributed object
systems, and studies system-level support for implementing dependable software. Two
schemes are developed - the t/(n-l)-VP approach is aimed at increasing software
reliability and controlling additional complexity, while the SCOP approach presents an
adaptive way of dynamically adjusting software reliability and efficiency aspects. As a
more general framework for constructing dependable concurrent and distributed
software, the Coordinated Atomic (CA) Action scheme is examined thoroughly. Key
properties of CA actions are formalized, conceptual model and mechanisms for
handling application level exceptions are devised, and object-based diversity
techniques are introduced to cope with potential software faults. These three schemes
are evaluated analytically and validated by controlled experiments. System-level
support is also addressed with a multi-level system architecture. An architectural
pattern for implementing fault-tolerant objects is documented in detail to capture
existing solutions and our previous experience. An industrial safety-critical application,
the Fault-Tolerant Production Cell, is used as a case study to examine most of the
concepts and techniques developed in this research.ESPRIT