4 research outputs found
Operating system fault tolerance support for real-time embedded applications
Tese de doutoramento em Electrónica Industrial (ramo de conhecimento em Informática Industrial)Fault tolerance is a means of achieving high dependability for critical and highavailability
systems. Despite the efforts to prevent and remove faults during the
development of these systems, the application of fault tolerance is usually required
because the hardware may fail during system operation and software faults are very
hard to eliminate completely.
One of the difficulties in implementing fault tolerance techniques is the lack of
support from operating systems and middleware. In most fault tolerant projects, the
programmer has to develop a fault tolerance implementation for each application.
This strong customization makes the fault-tolerant software costly and difficult to
implement and maintain. In particular, for small-scale embedded systems, the
introduction of fault tolerance techniques may also have impact on their restricted
resources, such as processing power and memory size.
The purpose of this research is to provide fault tolerance support for real-time
applications in small-scale embedded systems. The main approach of this thesis is to
develop and integrate a customizable and extendable fault tolerance framework into a
real-time operating system, in order to fulfill the needs of a large range of dependable
applications. Special attention is taken to allow the coexistence of fault tolerance with
real-time constraints. The utilization of the proposed framework features several
advantages over ad-hoc implementations, such as simplifying application-level
programming and improving the system configurability and maintainability.
In addition, this thesis also investigates the application of aspect-oriented
techniques to the development of real-time embedded fault-tolerant software. Aspect-
Oriented Programming (AOP) is employed to modularize all fault tolerant source code, following the principle of separation of concerns, and to integrate the proposed
framework into the operating system.
Two case studies are used to evaluate the proposed implementation in terms of
performance and resource costs. The results show that the overheads related to the
framework application are acceptable and the ones related to the AOP implementation
are negligible.Tolerância a falhas é um meio de obter-se alta confiabilidade para sistemas
críticos e de elevada disponibilidade. Apesar dos esforços para prevenir e remover
falhas durante o desenvolvimento destes sistemas, a aplicação de tolerância a falhas é
normalmente necessária, já que o hardware pode falhar durante a operação do sistema
e falhas de software são muito difíceis de eliminar completamente.
Uma das dificuldades na implementação de técnicas de tolerância a falhas é a
falta de suporte por parte dos sistemas operativos e middleware. Na maioria dos
projectos tolerantes a falhas, o programador deve desenvolver uma implementação de
tolerância a falhas para cada aplicação. Esta elevada adaptação torna o software
tolerante a falhas dispendioso e difícil de implementar e manter. Em particular, para
sistemas embebidos de pequena escala, a introdução de técnicas de tolerância a falhas
pode também ter impacto nos seus restritos recursos, tais como capacidade de
processamento e tamanho da memória.
O propósito desta tese é prover suporte à tolerância a falhas para aplicações de
tempo real em sistemas embebidos de pequena escala. A principal abordagem
utilizada nesta tese foi desenvolver e integrar uma framework tolerante a falhas,
customizável e extensível, a um sistema operativo de tempo real, a fim de satisfazer às
necessidades de uma larga gama de aplicações confiáveis. Especial atenção foi dada
para permitir a coexistência de tolerância a falhas com restrições de tempo real. A
utilização da framework proposta apresenta diversas vantagens sobre implementações
ad-hoc, tais como simplificar a programação a nível da aplicação e melhorar a
configurabilidade e a facilidade de manutenção do sistema.
Além disto, esta tese também investiga a aplicação de técnicas orientadas a
aspectos no desenvolvimento de software tolerante a falhas, embebido e de tempo
real. A Programação Orientada a Aspectos (POA) é empregada para segregar em módulos isolados todo o código fonte tolerante a falhas, seguindo o princípio da
separação de interesses, e para integrar a framework proposta com o sistema
operativo.
Dois casos de estudo são utilizados para avaliar a implementação proposta em
termos de desempenho e utilização de recursos. Os resultados mostram que os
acréscimos de recursos relativos à aplicação da framework são aceitáveis e os
relativos à implementação POA são insignificantes
Fault-tolerant software: dependability/performance trade-offs, concurrency and system support
PhD ThesisAs the use of computer systems becomes more and more widespread in applications
that demand high levels of dependability, these applications themselves are growing in
complexity in a rapid rate, especially in the areas that require concurrent and distributed
computing. Such complex systems are very prone to faults and errors. No matter how
rigorously fault avoidance and fault removal techniques are applied, software design
faults often remain in systems when they are delivered to the customers. In fact,
residual software faults are becoming the significant underlying cause of system
failures and the lack of dependability. There is tremendous need for systematic
techniques for building dependable software, including the fault tolerance techniques
that ensure software-based systems to operate dependably even when potential faults
are present. However, although there has been a large amount of research in the area of
fault-tolerant software, existing techniques are not yet sufficiently mature as a practical
engineering discipline for realistic applications. In particular, they are often inadequate
when applied to highly concurrent and distributed software.
This thesis develops new techniques for building fault-tolerant software, addresses the
problem of achieving high levels of dependability in concurrent and distributed object
systems, and studies system-level support for implementing dependable software. Two
schemes are developed - the t/(n-l)-VP approach is aimed at increasing software
reliability and controlling additional complexity, while the SCOP approach presents an
adaptive way of dynamically adjusting software reliability and efficiency aspects. As a
more general framework for constructing dependable concurrent and distributed
software, the Coordinated Atomic (CA) Action scheme is examined thoroughly. Key
properties of CA actions are formalized, conceptual model and mechanisms for
handling application level exceptions are devised, and object-based diversity
techniques are introduced to cope with potential software faults. These three schemes
are evaluated analytically and validated by controlled experiments. System-level
support is also addressed with a multi-level system architecture. An architectural
pattern for implementing fault-tolerant objects is documented in detail to capture
existing solutions and our previous experience. An industrial safety-critical application,
the Fault-Tolerant Production Cell, is used as a case study to examine most of the
concepts and techniques developed in this research.ESPRIT