21 research outputs found
Recommended from our members
Assessing Asymmetric Fault-Tolerant Software
The most popular forms of fault tolerance against design faults use "asymmetric" architectures in which a "primary" part performs the computation and a "secondary" part is in charge of detecting errors and performing some kind of error processing and recovery. In contrast, the most studied forms of software fault tolerance are "symmetric" ones, e.g. N-version programming. The latter are often controversial, the former are not. We discuss how to assess the dependability gains achieved by these methods. Substantial difficulties have been shown to exist for symmetric schemes, but we show that the same difficulties affect asymmetric schemes. Indeed, the latter present somewhat subtler problems. In both cases, to predict the dependability of the fault-tolerant system it is not enough to know the dependability of the individual components. We extend to asymmetric architectures the style of probabilistic modeling that has been useful for describing the dependability of "symmetric" architectures, to highlight factors that complicate the assessment. In the light of these models, we finally discuss fault injection approaches to estimating coverage factors. We highlight the limits of what can be predicted and some useful research directions towards clarifying and extending the range of situations in which estimates of coverage of fault tolerance mechanisms can be trusted
Sources of Variations in Error Sensitivity of Computer Systems
Technology scaling is reducing the reliability of integrated circuits. This makes it important to provide computers with mechanisms that can detect and correct hardware errors. This thesis deals with the problem of assessing the hardware error sensitivity of computer systems. Error sensitivity, which is the likelihood that a hardware error will escape detection and produce an erroneous output, measures a system’s inability to detect hardware errors. This thesis present the results of a series of fault injection experiments that investigated how er- ror sensitivity varies for different system characteristics, including (i) the inputs processed by a program, (ii) a program’s source code implementation, and (iii) the use of compiler optimizations. The study focused on the impact of tran- sient hardware faults that result in bit errors in CPU registers and main memory locations. We investigated how the error sensitivity varies for single-bit errors vs. double-bit errors, and how error sensitivity varies with respect to machine instructions that were targeted for fault injection. The results show that the in- put profile and source code implementation of the investigated programs had a major impact on error sensitivity, while using different compiler optimizations caused only minor variations. There was no significant difference in error sen- sitivity between single-bit and double-bit errors. Finally, the error sensitivity seems to depend more on the type of data processed by an instruction than on the instruction type
Recommended from our members
Developing a multi-level fault injection environment
textDependability and fault tolerance are important aspects of modern computer systems. Particle strikes or electromagnetic interference can cause internal state of the system to change, which might cause errors to the system with non-negligible probability. Such errors are termed "soft errors". Bit flips in the design are good way to model these soft-errors. These bit-flips due to soft errors are random and transient in a design, making their analysis more difficult than simple stuck-at faults. Interestingly only a few of the flops which are affected by radiation cause soft errors, due to different propagation paths and functional impact of the flops. In order to improve the dependability of a system with reasonable overhead, the flops in a design which are most vulnerable to soft errors need to be protected. Each application case can potentially expose a slightly different set of flip-flops as vulnerable. Hence different tools are required to confidently analyse soft errors for evaluating the fault tolerance. As part of the thesis, I have developed a suite of tools for analyzing soft errors. The multi-level tools are necessary for complete fault tolerance analysis and identifying the most vulnerable flip-flops in a specific processor. The first part of the thesis describes the FPGA development framework for a specific processor. Simulation based fault injection techniques are described in the later sections. The final parts cover analysis techniques and applications that can benefit from such systems.Electrical and Computer Engineerin
Requirement based Test Case Prioritization for System Testing
System Testing encompasses a large number of test cases, which may not be able to get executed due to constrained time, budget and limitation of the resources. Therefore, the test cases must be prioritized in some order such that the critical and most required functionality can be tested early. In this paper, a hierarchical approach for system test case prioritization based on requirements has been proposed that maps requirements on the system test cases. This approach analyzes and assigns value to each requirement based on a comprehensive set of twelve factors thereby prioritizing the requirements. Further, the prioritized requirement is mapped on the highly relevant module and then prioritized set of test cases. To analyze the effectiveness of this approach, a case study of income tax calculator software [1] has been taken. The existing as well as the proposed approach were applied and analyzed on this software. The results show the efficacy of the proposed approach in terms of fault detection and severity early
Real-time fault injection using enhanced on-chip debug infrastructures
The rapid increase in the use of microprocessor-based systems in critical areas, where failures imply risks to human lives, to the environment or to expensive equipment, significantly increased the need for dependable systems, able to detect, tolerate and eventually correct faults. The verification and validation of such systems is frequently performed via fault injection, using various forms and techniques. However, as electronic devices get smaller and more complex, controllability and observability issues, and sometimes real time constraints, make it harder to apply most conventional fault injection techniques. This paper proposes a fault injection environment and a scalable methodology to assist the execution of real-time fault injection campaigns, providing enhanced performance and capabilities. Our proposed solutions are based on the use of common and customized on-chip debug (OCD) mechanisms, present in many modern electronic devices, with the main objective of enabling the insertion of faults in microprocessor memory elements with minimum delay and intrusiveness. Different configurations were implemented starting from basic Components Off-The-Shelf (COTS) microprocessors, equipped with real-time OCD infrastructures, to improved solutions based on modified interfaces, and dedicated OCD circuitry that enhance fault injection capabilities and performance. All methodologies and configurations were evaluated and compared concerning performance gain and silicon overhead
Fault injection for the evaluation of critical systems
Dissertação de mestrado em Engenharia InformáticaAtualmente, os sistemas críticos estão cada vez mais presentes no nosso dia-a-dia, fazendo
aumentar a necessidade de os assegurar cada vez mais e reduzindo o risco de acidente ou
falha. A industria espacial e automóvel são exemplos de indústrias que usam esses sistemas
e que necessitam de os ver assegurados. Consequentemente, têm de ser tomadas medidas
para garantir a segurança de um sistema ao nível de software e hardware.
A injeção de falhas é uma das respostas a esse problema, fazendo uso das suas diferentes
técnicas para poder avaliar e validar sistemas críticos. A injeção de falhas pode ser considerada
uma técnica de teste ao software, onde as falhas podem ser injetadas ao nível do software
ou hardware e cujos resultados podem ser monitorizados de forma a avaliar como é que o
sistema reagiu a tais falhas. Scan-Chain Implemented Fault Injection é a técnica de injeção
de falhas que proporciona uma maior acessibilidade, observabilidade e controlabilidade. Com
esta técnica, os níveis de hardware e de integração de sistemas podem ser validados.
O csXception® é um ambiente de injeção de falhas automatizado desenvolvido pela Critical
Software S.A para avaliar e validar sistemas críticos. A sua arquitetura é dinâmica e baseada
em plug-ins de injeção de falhas. Devido à crescente presença dos microcontroladores ARM®
Cortex-M3 na industria automóvel, surgiu a necessidade de criar um novo plug-in de injeção
de falhas para o csXception®.
Assim, o objectivo principal desta dissertação de mestrado é o desenvolvimento de um
novo plug-in de injeção de falhas para o csXception®, que permita injetar falhas em microcontroladores
ARM® Cortex-M3, contextualizar o novo plug-in com a norma ISO-26262 e utilizar
um caso de estudo para mostrar alguns dos resultados obtidos.Nowadays, critical systems are much more present in our daily life, increasing the need
to ensure that these systems are becoming safer and thus reducing the risk of accident or
failure. The space and automotive industry are examples of industries who use these systems
and need to see them insured. Therefore, actions need to be taken to guarantee the safety of
a system, both at software and hardware levels.
Fault injection is one of the answers to that specific problem, making use of its different
techniques in order to respond to the critical system validation and evaluation. Fault injection
can be considered as a testing technique, where faults are injected in the hardware or
software levels and whose results are monitored in order to evaluate how the system handles
such faults. Scan-Chain Implemented Fault Injection is a fault injection technique that provides
more reachability, observability and controllability. With this technique, the hardware-level and
system-integration validation can be guaranteed.
csXception® is an automated fault injection environment that validates and evaluates critical
systems. Developed by Critical Software, S.A., the csXception®'s architecture is dynamic
and based on fault injection plug-ins. With the increasing presence of Cortex-M3 microcontrollers
on the automotive industry, a new plug-in for csXception® needs to be developed.
Thus, the main goal of this master dissertation is the development of a new fault injection
plug-in for csXception® that allows the user to inject faults into ARM® Cortex-M3 microcontrollers,
to contextualize the new plug-in with the ISO-26262 safety standards and to use a case
study to show some of the obtained results
Characterization of HIRF Susceptibility Threshold for a Prototype Implementation of an Onboard Data Network
An experiment was conducted to characterize the effects of HIRF-induced upsets on a prototype onboard data network. The experiment was conducted at the NASA Langley Research Center s High Intensity Radiation Field Laboratory and used a generic distributed system prototyping platform to realize the data network. This report presents the results of the hardware susceptibility threshold characterization which examined the dependence of measured susceptibility on factors like the frequency and modulation of the radiation, layout of the physical nodes and position of the nodes in the test chamber. The report also includes lessons learned during the development and execution of the experiment
Dependability analysis of web services
Web Services form the basis of the web based eCommerce eScience applications so it is vital that robust services are developed. Traditional validation and verification techniques are centred around the concept of removing all faults to guarantee correct operation whereas Dependability gives an assessment of how dependably a system can deliver the required functionality by assessing attributes, and by eliminating threats via means attempts to improve dependability. Fault injection is a well-proven dependability assessment method. Although much work has been done in the area of fault injection and distributed systems in general, there appears to have been little research carried out on applying this to middleware systems and Web Services in particular. There are additional problems associated with applying existing fault injection technologies to Web Services running in a virtual machine environment since most are either invasive or work at a machine level. The Fault Injection Technology (FIT) method has been devised to address these problems for middleware systems. The Web Service-Fault Injection Technology (WS-FIT) implementation applies the FIT method, based on network level fault injection, to Web Services to create a non-invasive dependability assessment method. It allows targeted perturbation of Web Service RFC parameters as well as more traditional network level fault injection operations. The WS-FIT tool includes taxonomies that define a system under test, fault models to apply and failure modes to be detected, and uses these taxonomies to generate fault injection campaigns. WS-FIT has been applied to a number of case studies and has successfully demonstrated its effectiveness. It has also been successfully applied to a third-party system to evaluate dependability means. It performed this dependability assessment as well as allowing debugging of the means to be undertaken uncovering unknown faults
Uma abordagem para o teste de dependabilidade de sistemas MapReduce com base em casos de falha representativos
Resumo: Os sistemas MapReduce facilitam a utilização de um grande número de máquinas para processar uma grande quantidade de dados, e têm sido utilizados por diversas aplicações, que incluem desde ferramentas de pesquisa até sistemas comerciais e financeiros. Uma das principais características dos sistemas MapReduce é abstrair problemas relacionados ao ambiente distribuído, tais como a distribuição do processamento e a tolerância a falhas. Com isso, torna-se imprescindível garantir a dependabilidade dos sistemas MapReduce, ou seja, garantir que esses sistemas funcionem corretamente mesmo na presença de falhas. Por outro lado, a falta de determinismo de um ambiente distribuído e a falta de confiabilidade do ambiente físico, podem gerar erros nos sistemas MapReduce que sejam difíceis de serem encontrados, entendidos e corrigidos. Esta tese apresenta a primeira abordagem conhecida para o teste de dependabilidade para sistemas MapReduce. Este trabalho apresenta uma definição para o teste de dependabilidade, uma modelagem do mecanismo de tolerância a falhas do MapReduce, um processo para gerar casos de falha representativos a partir de um modelo, e uma plataforma de teste para automatizar a execução de casos de falha em um ambiente distribuído. Este trabalho ainda apresenta uma nova abordagem para modelar componentes distribuídos usando redes de Petri. Essa nova abordagem permite representar a dinâmica dos componentes e a independência de suas ações e estados. Resultados experimentais são apresentados e mostram que os casos de falha gerados a partir do modelo são representativos para o teste do sistema Hadoop, principal implementação de código aberto do MapReduce. Através dos experimentos, diversos erros são encontrados no Hadoop, e os resultados também comprovam que a plataforma de teste automatiza a execução dos casos de falha representativos. Além disso, a plataforma apresenta as propriedades requeridas para uma plataforma de teste, que são a controlabilidade, medição temporal, não-intrusividade, repetibilidade, e a eficácia na identificação de sistemas com erros