916 research outputs found
Reasoning and Improving on Software Resilience against Unanticipated Exceptions
In software, there are the errors anticipated at specification and design
time, those encountered at development and testing time, and those that happen
in production mode yet never anticipated. In this paper, we aim at reasoning on
the ability of software to correctly handle unanticipated exceptions. We
propose an algorithm, called short-circuit testing, which injects exceptions
during test suite execution so as to simulate unanticipated errors. This
algorithm collects data that is used as input for verifying two formal
exception contracts that capture two resilience properties. Our evaluation on 9
test suites, with 78% line coverage in average, analyzes 241 executed catch
blocks, shows that 101 of them expose resilience properties and that 84 can be
transformed to be more resilient
Experimental analysis of computer system dependability
This paper reviews an area which has evolved over the past 15 years: experimental analysis of computer system dependability. Methodologies and advances are discussed for three basic approaches used in the area: simulated fault injection, physical fault injection, and measurement-based analysis. The three approaches are suited, respectively, to dependability evaluation in the three phases of a system's life: design phase, prototype phase, and operational phase. Before the discussion of these phases, several statistical techniques used in the area are introduced. For each phase, a classification of research methods or study topics is outlined, followed by discussion of these methods or topics as well as representative studies. The statistical techniques introduced include the estimation of parameters and confidence intervals, probability distribution characterization, and several multivariate analysis methods. Importance sampling, a statistical technique used to accelerate Monte Carlo simulation, is also introduced. The discussion of simulated fault injection covers electrical-level, logic-level, and function-level fault injection methods as well as representative simulation environments such as FOCUS and DEPEND. The discussion of physical fault injection covers hardware, software, and radiation fault injection methods as well as several software and hybrid tools including FIAT, FERARI, HYBRID, and FINE. The discussion of measurement-based analysis covers measurement and data processing techniques, basic error characterization, dependency analysis, Markov reward modeling, software-dependability, and fault diagnosis. The discussion involves several important issues studies in the area, including fault models, fast simulation techniques, workload/failure dependency, correlated failures, and software fault tolerance
A study of the relationship between the performance and dependability of a fault-tolerant computer
This thesis studies the relationship by creating a tool (FTAPE) that integrates a high stress workload generator with fault injection and by using the tool to evaluate system performance under error conditions. The workloads are comprised of processes which are formed from atomic components that represent CPU, memory, and I/O activity. The fault injector is software-implemented and is capable of injecting any memory addressable location, including special registers and caches. This tool has been used to study a Tandem Integrity S2 Computer. Workloads with varying numbers of processes and varying compositions of CPU, memory, and I/O activity are first characterized in terms of performance. Then faults are injected into these workloads. The results show that as the number of concurrent processes increases, the mean fault latency initially increases due to increased contention for the CPU. However, for even higher numbers of processes (less than 3 processes), the mean latency decreases because long latency faults are paged out before they can be activated
Recommended from our members
Developing a multi-level fault injection environment
textDependability and fault tolerance are important aspects of modern computer systems. Particle strikes or electromagnetic interference can cause internal state of the system to change, which might cause errors to the system with non-negligible probability. Such errors are termed "soft errors". Bit flips in the design are good way to model these soft-errors. These bit-flips due to soft errors are random and transient in a design, making their analysis more difficult than simple stuck-at faults. Interestingly only a few of the flops which are affected by radiation cause soft errors, due to different propagation paths and functional impact of the flops. In order to improve the dependability of a system with reasonable overhead, the flops in a design which are most vulnerable to soft errors need to be protected. Each application case can potentially expose a slightly different set of flip-flops as vulnerable. Hence different tools are required to confidently analyse soft errors for evaluating the fault tolerance. As part of the thesis, I have developed a suite of tools for analyzing soft errors. The multi-level tools are necessary for complete fault tolerance analysis and identifying the most vulnerable flip-flops in a specific processor. The first part of the thesis describes the FPGA development framework for a specific processor. Simulation based fault injection techniques are described in the later sections. The final parts cover analysis techniques and applications that can benefit from such systems.Electrical and Computer Engineerin
SEU effect analysis in a open-source router via a distributed fault injection environment
The paper presents a detailed error analysis and classification of the behavior of an open-source router when affected by Single Event Upsets (SEUs). The experimental results have been gathered on a real communication network, resorting to an ad-hoc Fault Injection system. The injector has been designed to corrupt the router during its normal service and to analyze the SEU injection effects on the overall distributed system. The performed experiments allowed the authors to identify the most critical memory regions and to cluster the router variables according to their impact on system dependability
Sustainable Fault-handling Of Reconfigurable Logic Using Throughput-driven Assessment
A sustainable Evolvable Hardware (EH) system is developed for SRAM-based reconfigurable Field Programmable Gate Arrays (FPGAs) using outlier detection and group testing-based assessment principles. The fault diagnosis methods presented herein leverage throughput-driven, relative fitness assessment to maintain resource viability autonomously. Group testing-based techniques are developed for adaptive input-driven fault isolation in FPGAs, without the need for exhaustive testing or coding-based evaluation. The techniques maintain the device operational, and when possible generate validated outputs throughout the repair process. Adaptive fault isolation methods based on discrepancy-enabled pair-wise comparisons are developed. By observing the discrepancy characteristics of multiple Concurrent Error Detection (CED) configurations, a method for robust detection of faults is developed based on pairwise parallel evaluation using Discrepancy Mirror logic. The results from the analytical FPGA model are demonstrated via a self-healing, self-organizing evolvable hardware system. Reconfigurability of the SRAM-based FPGA is leveraged to identify logic resource faults which are successively excluded by group testing using alternate device configurations. This simplifies the system architect\u27s role to definition of functionality using a high-level Hardware Description Language (HDL) and system-level performance versus availability operating point. System availability, throughput, and mean time to isolate faults are monitored and maintained using an Observer-Controller model. Results are demonstrated using a Data Encryption Standard (DES) core that occupies approximately 305 FPGA slices on a Xilinx Virtex-II Pro FPGA. With a single simulated stuck-at-fault, the system identifies a completely validated replacement configuration within three to five positive tests. The approach demonstrates a readily-implemented yet robust organic hardware application framework featuring a high degree of autonomous self-control
Prototype of running clinical trials in an untrustworthy environment using blockchain.
Monitoring and ensuring the integrity of data within the clinical trial process is currently not always feasible with the current research system. We propose a blockchain-based system to make data collected in the clinical trial process immutable, traceable, and potentially more trustworthy. We use raw data from a real completed clinical trial, simulate the trial onto a proof of concept web portal service, and test its resilience to data tampering. We also assess its prospects to provide a traceable and useful audit trail of trial data for regulators, and a flexible service for all members within the clinical trials network. We also improve the way adverse events are currently reported. In conclusion, we advocate that this service could offer an improvement in clinical trial data management, and could bolster trust in the clinical research process and the ease at which regulators can oversee trials
Measuring fault tolerance with the FTAPE fault injection tool
This paper describes FTAPE (Fault Tolerance And Performance Evaluator), a tool that can be used to compare fault-tolerant computers. The major parts of the tool include a system-wide fault-injector, a workload generator, and a workload activity measurement tool. The workload creates high stress conditions on the machine. Using stress-based injection, the fault injector is able to utilize knowledge of the workload activity to ensure a high level of fault propagation. The errors/fault ratio, performance degradation, and number of system crashes are presented as measures of fault tolerance
Injecting software faults in Python applications
As técnicas de injeção de falhas de software têm sido amplamente utilizadas como meio
para avaliar a confiabilidade de sistemas na presença de certos tipos de falhas. Apesar
da grande diversidade de ferramentas que oferecem a possibilidade de emular a presença
de falhas de software, há pouco suporte prático para emular a presença de falhas de soft ware em aplicações Python, que cada vez mais são usados para suportar serviços cloud
crÃticos para negócios. Nesta tese, apresentamos uma ferramenta (de nome Fit4Python)
para injetar falhas de software em código Python e, de seguida, usamo-la para analisar a
eficácia da bateria de testes do OpenStack contra estas novas, prováveis, falhas de software.
Começamos por analisar os tipos de falhas que afetam o Nova Compute, um componente
central do OpenStack. Usamos a nossa ferramenta para emular a presença de novas falhas
na API Nova Compute de forma a entender como a bateria de testes unitários, funcionais
e de integração do OpenStack cobre essas novas, mas prováveis, situações. Os resultados
mostram limitações claras na eficácia da bateria de testes dos programadores do Open Stack, com muitos casos de falhas injetadas a passarem sem serem detectadas por todos
os três tipos de testes. Para além disto, observamos que que a maioria dos problemas
analisados poderia ser detectada com mudanças ou acréscimos triviais aos testes unitários
- …