5 research outputs found

    Application-level fault tolerance in real-time embedded systems

    Get PDF
    Critical real-time embedded systems need to make use of fault tolerance techniques to cope with operation time errors, either in hardware or software. Fault tolerance is usually applied by means of redundancy and diversity. Redundant hardware implies the establishment of a distributed system executing a set of fault tolerance strategies by software, and may also employ some form of diversity, by using different variants or versions for the same processing. This work proposes and evaluates a fault tolerance framework for supporting the development of dependable applications. This framework is build upon basic operating system services and middleware communications and brings flexible and transparent support for application threads. A case study involving radar filtering is described and the framework advantages and drawbacks are discussed.Fundação para a Ciência e a Tecnologia (FCT

    ScOSA system software: the reliable and scalable middleware for a heterogeneous and distributed on-board computer architecture

    Get PDF
    Designing on-board computers (OBC) for future space missions is determined by the trade-off between reliability and performance. Space applications with higher computational demands are not supported by currently available, state-of-the-art, space-qualified computing hardware, since their requirements exceed the capabilities of these components. Such space applications include Earth observation with high-resolution cameras, on-orbit real-time servicing, as well as autonomous spacecraft and rover missions on distant celestial bodies. An alternative to state-of-the-art space-qualified computing hardware is the use of commercial-off-the-shelf (COTS) components for the OBC. Not only are these components cheap and widely available, but they also achieve high performance. Unfortunately, they are also significantly more vulnerable to errors induced by radiation than space-qualified components. The ScOSA (Scalable On-board Computing for Space Avionics) Flight Experiment project aims to develop an OBC architecture which avoids this trade-off by combining space-qualified radiation-hardened components (the reliable computing nodes, RCNs) together with COTS components (the high performance nodes, HPNs) into a single distributed system. To abstract this heterogeneous architecture for the application developers, we are developing a middleware for the aforementioned OBC architecture. Besides providing an monolithic abstraction of the distributed system, the middleware shall also enhance the architecture by providing additional reliability and fault tolerance. In this paper, we present the individual components comprising the middleware, alongside the features the middleware offers. Since the ScOSA Flight Experiment project is a successor of the OBC-NG and the ScOSA projects, its middleware is also a further development of the existing middleware. Therefore, we will present and discuss our contributions and plans for enhancement of the middleware in the course of the current project. Finally, we will present first results for the scalability of the middleware, which we obtained by conducting software-in-the-loop experiments of different sized scenarios

    Enabling system survival across hypervisor failures

    Get PDF
    Dissertação de mestrado em Engenharia Eletrónica Industrial e ComputadoresEmbedded system’s evolution is notorious and due to the complexity growth, these systems possess more general purpose behaviour instead of its original single purpose features. Naturally, virtualization started to impact this matter. This technology decreases the hardware costs since it allows to run several software components on the same hardware. Although virtualization begun as a pure software layer, many companies started to provide hardware solutions to assist it. Despite ARM TrustZone technology being a security extension, many developers realized that it was possible to use this extension to support development of hypervisors. With TrustZone, hypervisors can ensure one of the most important features in virtualization: isolation between guests. However, this hardware technology revealed some vulnerabilities and since the whole system is TrustZone dependent, the virtualization can be compromised. To address this problem, this thesis proposes an hybrid software/hardware mechanism to handle failures of TrustZone-based hypervisors. By using the processor’s abort exceptions and hash keys, this project detects system malfunctions caused by imperfect designs or even deliberate attacks. Additionally, it provides a restoration model by checkpoints which allows a system recovery without major throwbacks. The implemented solution was deployed on TrustZone-based LTZVisor, an open-source and in-house hypervisor, and the revealed results are appealing. With a 6.5% memory footprint increase and in the worst case scenario, an increment of 23% in context switching time, it is possible to detect secure memory invasions and recover the system. Despite of the hypervisor memory footprint increment and latency addition, the reliability and availability that the system bring to the LTZVisor are unquestionable.A evolução dos sistemas embebidos é notória e, devido ao aumento da sua complexidade, estes sistemas cada vez mais possuem um comportamento de propósito geral, em vez das suas características originais de propósito único. Naturalmente, a virtualização começou a ter impacto sobre este meio, uma vez que permite executar vários componentes de software no mesmo hardware, diminuindo os custos de hardware. Embora a virtualização tenha começado como uma camada de software pura, muitas empresas começaram a fornecer soluções de hardware para auxiliá-lo. Apesar da TrustZone ter sido projetada pela ARM para ser uma extensão de segurança, muitos desenvolvedores perceberam que era possível usá-la para suporte ao desenvolvimento de hipervisores. Com a TrustZone, os hipervisores podem garantir uma das premissas mais importantes da virtualização: isolamento entre hóspedes. No entanto, esta tecnologia de hardware revelou algumas vulnerabilidades e, sendo todo o sistema dependente da TrustZone, a virtualização pode ficar comprometida. Para solucionar o problema, esta tese propõe um mecanismo híbrido de software/ hardware para lidar com as falhas em hipervisores baseados em TrustZone. Usando as excepções do processador e chaves de hash, este projecto detecta defeitos no sistema causados por imperfeições no design e também ataques intencionais. Além disso, este fornece um modelo de restauração por pontos de verificação, permitindo uma recuperação do sistema sem grandes retrocessos. A solução foi implementada no LTZVisor, um hipervisor em código aberto e desenvolvido no ESRG, sendo que os resultados revelados são satisfatórios. Com um aumento de 6,5% da memória usada e um incremento, no pior caso, de 23% no tempo de troca de contexto, é possível detectar invasões de memória segura e recuperar o sistema. Apesar do incremento de memória do hypervisor e da adição de latência, a confiabilidade e a disponibilidade que o sistema oferece ao LTZVisor são inquestionáveis

    Operating system fault tolerance support for real-time embedded applications

    Get PDF
    Tese de doutoramento em Electrónica Industrial (ramo de conhecimento em Informática Industrial)Fault tolerance is a means of achieving high dependability for critical and highavailability systems. Despite the efforts to prevent and remove faults during the development of these systems, the application of fault tolerance is usually required because the hardware may fail during system operation and software faults are very hard to eliminate completely. One of the difficulties in implementing fault tolerance techniques is the lack of support from operating systems and middleware. In most fault tolerant projects, the programmer has to develop a fault tolerance implementation for each application. This strong customization makes the fault-tolerant software costly and difficult to implement and maintain. In particular, for small-scale embedded systems, the introduction of fault tolerance techniques may also have impact on their restricted resources, such as processing power and memory size. The purpose of this research is to provide fault tolerance support for real-time applications in small-scale embedded systems. The main approach of this thesis is to develop and integrate a customizable and extendable fault tolerance framework into a real-time operating system, in order to fulfill the needs of a large range of dependable applications. Special attention is taken to allow the coexistence of fault tolerance with real-time constraints. The utilization of the proposed framework features several advantages over ad-hoc implementations, such as simplifying application-level programming and improving the system configurability and maintainability. In addition, this thesis also investigates the application of aspect-oriented techniques to the development of real-time embedded fault-tolerant software. Aspect- Oriented Programming (AOP) is employed to modularize all fault tolerant source code, following the principle of separation of concerns, and to integrate the proposed framework into the operating system. Two case studies are used to evaluate the proposed implementation in terms of performance and resource costs. The results show that the overheads related to the framework application are acceptable and the ones related to the AOP implementation are negligible.Tolerância a falhas é um meio de obter-se alta confiabilidade para sistemas críticos e de elevada disponibilidade. Apesar dos esforços para prevenir e remover falhas durante o desenvolvimento destes sistemas, a aplicação de tolerância a falhas é normalmente necessária, já que o hardware pode falhar durante a operação do sistema e falhas de software são muito difíceis de eliminar completamente. Uma das dificuldades na implementação de técnicas de tolerância a falhas é a falta de suporte por parte dos sistemas operativos e middleware. Na maioria dos projectos tolerantes a falhas, o programador deve desenvolver uma implementação de tolerância a falhas para cada aplicação. Esta elevada adaptação torna o software tolerante a falhas dispendioso e difícil de implementar e manter. Em particular, para sistemas embebidos de pequena escala, a introdução de técnicas de tolerância a falhas pode também ter impacto nos seus restritos recursos, tais como capacidade de processamento e tamanho da memória. O propósito desta tese é prover suporte à tolerância a falhas para aplicações de tempo real em sistemas embebidos de pequena escala. A principal abordagem utilizada nesta tese foi desenvolver e integrar uma framework tolerante a falhas, customizável e extensível, a um sistema operativo de tempo real, a fim de satisfazer às necessidades de uma larga gama de aplicações confiáveis. Especial atenção foi dada para permitir a coexistência de tolerância a falhas com restrições de tempo real. A utilização da framework proposta apresenta diversas vantagens sobre implementações ad-hoc, tais como simplificar a programação a nível da aplicação e melhorar a configurabilidade e a facilidade de manutenção do sistema. Além disto, esta tese também investiga a aplicação de técnicas orientadas a aspectos no desenvolvimento de software tolerante a falhas, embebido e de tempo real. A Programação Orientada a Aspectos (POA) é empregada para segregar em módulos isolados todo o código fonte tolerante a falhas, seguindo o princípio da separação de interesses, e para integrar a framework proposta com o sistema operativo. Dois casos de estudo são utilizados para avaliar a implementação proposta em termos de desempenho e utilização de recursos. Os resultados mostram que os acréscimos de recursos relativos à aplicação da framework são aceitáveis e os relativos à implementação POA são insignificantes
    corecore