Search CORE

916 research outputs found

Reasoning and Improving on Software Resilience against Unanticipated Exceptions

Author: Cornu Benoit
Monperrus Martin
Seinturier Lionel
Publication venue
Publication date: 01/01/2013
Field of study

In software, there are the errors anticipated at specification and design time, those encountered at development and testing time, and those that happen in production mode yet never anticipated. In this paper, we aim at reasoning on the ability of software to correctly handle unanticipated exceptions. We propose an algorithm, called short-circuit testing, which injects exceptions during test suite execution so as to simulate unanticipated errors. This algorithm collects data that is used as input for verifying two formal exception contracts that capture two resilience properties. Our evaluation on 9 test suites, with 78% line coverage in average, analyzes 241 executed catch blocks, shows that 101 of them expose resilience properties and that 84 can be transformed to be more resilient

arXiv.org e-Print Archive

HAL - Lille 3

INRIA a CCSD electronic archive server

Experimental analysis of computer system dependability

Author: Iyer Ravishankar, K.
Tang Dong
Publication venue
Publication date
Field of study

This paper reviews an area which has evolved over the past 15 years: experimental analysis of computer system dependability. Methodologies and advances are discussed for three basic approaches used in the area: simulated fault injection, physical fault injection, and measurement-based analysis. The three approaches are suited, respectively, to dependability evaluation in the three phases of a system's life: design phase, prototype phase, and operational phase. Before the discussion of these phases, several statistical techniques used in the area are introduced. For each phase, a classification of research methods or study topics is outlined, followed by discussion of these methods or topics as well as representative studies. The statistical techniques introduced include the estimation of parameters and confidence intervals, probability distribution characterization, and several multivariate analysis methods. Importance sampling, a statistical technique used to accelerate Monte Carlo simulation, is also introduced. The discussion of simulated fault injection covers electrical-level, logic-level, and function-level fault injection methods as well as representative simulation environments such as FOCUS and DEPEND. The discussion of physical fault injection covers hardware, software, and radiation fault injection methods as well as several software and hybrid tools including FIAT, FERARI, HYBRID, and FINE. The discussion of measurement-based analysis covers measurement and data processing techniques, basic error characterization, dependency analysis, Markov reward modeling, software-dependability, and fault diagnosis. The discussion involves several important issues studies in the area, including fault models, fast simulation techniques, workload/failure dependency, correlated failures, and software fault tolerance

NASA Technical Reports Server

A study of the relationship between the performance and dependability of a fault-tolerant computer

Author: Goswami Kumar K.
Publication venue
Publication date
Field of study

This thesis studies the relationship by creating a tool (FTAPE) that integrates a high stress workload generator with fault injection and by using the tool to evaluate system performance under error conditions. The workloads are comprised of processes which are formed from atomic components that represent CPU, memory, and I/O activity. The fault injector is software-implemented and is capable of injecting any memory addressable location, including special registers and caches. This tool has been used to study a Tandem Integrity S2 Computer. Workloads with varying numbers of processes and varying compositions of CPU, memory, and I/O activity are first characterized in terms of performance. Then faults are injected into these workloads. The results show that as the number of concurrent processes increases, the mean fault latency initially increases due to increased contention for the CPU. However, for even higher numbers of processes (less than 3 processes), the mean latency decreases because long latency faults are paged out before they can be activated

NASA Technical Reports Server

Recommended from our members

Developing a multi-level fault injection environment

Author: Samynathan Balavinayaga
Publication venue
Publication date: 22/10/2015
Field of study

textDependability and fault tolerance are important aspects of modern computer systems. Particle strikes or electromagnetic interference can cause internal state of the system to change, which might cause errors to the system with non-negligible probability. Such errors are termed "soft errors". Bit flips in the design are good way to model these soft-errors. These bit-flips due to soft errors are random and transient in a design, making their analysis more difficult than simple stuck-at faults. Interestingly only a few of the flops which are affected by radiation cause soft errors, due to different propagation paths and functional impact of the flops. In order to improve the dependability of a system with reasonable overhead, the flops in a design which are most vulnerable to soft errors need to be protected. Each application case can potentially expose a slightly different set of flip-flops as vulnerable. Hence different tools are required to confidently analyse soft errors for evaluating the fault tolerance. As part of the thesis, I have developed a suite of tools for analyzing soft errors. The multi-level tools are necessary for complete fault tolerance analysis and identifying the most vulnerable flip-flops in a specific processor. The first part of the thesis describes the FPGA development framework for a specific processor. Simulation based fault injection techniques are described in the later sections. The final parts cover analysis techniques and applications that can benefit from such systems.Electrical and Computer Engineerin

Texas ScholarWorks

SEU effect analysis in a open-source router via a distributed fault injection environment

Author: Benso Alfredo
DI CARLO Stefano
DI NATALE Giorgio
Prinetto Paolo Ernesto
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

The paper presents a detailed error analysis and classification of the behavior of an open-source router when affected by Single Event Upsets (SEUs). The experimental results have been gathered on a real communication network, resorting to an ad-hoc Fault Injection system. The injector has been designed to corrupt the router during its normal service and to analyze the SEU injection effects on the overall distributed system. The performed experiments allowed the authors to identify the most critical memory regions and to cluster the router variables according to their impact on system dependability

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

Sustainable Fault-handling Of Reconfigurable Logic Using Throughput-driven Assessment

Author: Sharma Carthik
Publication venue: 'Information Bulletin on Variable Stars (IBVS)'
Publication date: 01/01/2008
Field of study

A sustainable Evolvable Hardware (EH) system is developed for SRAM-based reconfigurable Field Programmable Gate Arrays (FPGAs) using outlier detection and group testing-based assessment principles. The fault diagnosis methods presented herein leverage throughput-driven, relative fitness assessment to maintain resource viability autonomously. Group testing-based techniques are developed for adaptive input-driven fault isolation in FPGAs, without the need for exhaustive testing or coding-based evaluation. The techniques maintain the device operational, and when possible generate validated outputs throughout the repair process. Adaptive fault isolation methods based on discrepancy-enabled pair-wise comparisons are developed. By observing the discrepancy characteristics of multiple Concurrent Error Detection (CED) configurations, a method for robust detection of faults is developed based on pairwise parallel evaluation using Discrepancy Mirror logic. The results from the analytical FPGA model are demonstrated via a self-healing, self-organizing evolvable hardware system. Reconfigurability of the SRAM-based FPGA is leveraged to identify logic resource faults which are successively excluded by group testing using alternate device configurations. This simplifies the system architect\u27s role to definition of functionality using a high-level Hardware Description Language (HDL) and system-level performance versus availability operating point. System availability, throughput, and mean time to isolate faults are monitored and maintained using an Observer-Controller model. Results are demonstrated using a Data Encryption Standard (DES) core that occupies approximately 305 FPGA slices on a Xilinx Virtex-II Pro FPGA. With a single simulated stuck-at-fault, the system identifies a completely validated replacement configuration within three to five positive tests. The approach demonstrates a readily-implemented yet robust organic hardware application framework featuring a high degree of autonomous self-control

University of Central Florida (UCF): STARS (Showcase of Text, Archives, Research & Scholarship)

Prototype of running clinical trials in an untrustworthy environment using blockchain.

Author: Bhattacharya Sanchita
Butte Atul J
Wong Daniel R
Publication venue: eScholarship, University of California
Publication date: 01/02/2019
Field of study

Monitoring and ensuring the integrity of data within the clinical trial process is currently not always feasible with the current research system. We propose a blockchain-based system to make data collected in the clinical trial process immutable, traceable, and potentially more trustworthy. We use raw data from a real completed clinical trial, simulate the trial onto a proof of concept web portal service, and test its resilience to data tampering. We also assess its prospects to provide a traceable and useful audit trail of trial data for regulators, and a flexible service for all members within the clinical trials network. We also improve the way adverse events are currently reported. In conclusion, we advocate that this service could offer an improvement in clinical trial data management, and could bolster trust in the clinical research process and the ease at which regulators can oversee trials

Directory of Open Access Journals

eScholarship - University of California

Measuring fault tolerance with the FTAPE fault injection tool

Author: Iyer Ravishankar K.
Tsai Timothy K.
Publication venue
Publication date
Field of study

This paper describes FTAPE (Fault Tolerance And Performance Evaluator), a tool that can be used to compare fault-tolerant computers. The major parts of the tool include a system-wide fault-injector, a workload generator, and a workload activity measurement tool. The workload creates high stress conditions on the machine. Using stress-based injection, the fault injector is able to utilize knowledge of the workload activity to ensure a high level of fault propagation. The errors/fault ratio, performance degradation, and number of system crashes are presented as measures of fault tolerance

NASA Technical Reports Server

Injecting software faults in Python applications

Author: Marques Henrique Manuel Domingues
Publication venue
Publication date: 17/11/2021
Field of study

As técnicas de injeção de falhas de software têm sido amplamente utilizadas como meio para avaliar a confiabilidade de sistemas na presença de certos tipos de falhas. Apesar da grande diversidade de ferramentas que oferecem a possibilidade de emular a presença de falhas de software, há pouco suporte prático para emular a presença de falhas de soft ware em aplicações Python, que cada vez mais são usados para suportar serviços cloud críticos para negócios. Nesta tese, apresentamos uma ferramenta (de nome Fit4Python) para injetar falhas de software em código Python e, de seguida, usamo-la para analisar a eficácia da bateria de testes do OpenStack contra estas novas, prováveis, falhas de software. Começamos por analisar os tipos de falhas que afetam o Nova Compute, um componente central do OpenStack. Usamos a nossa ferramenta para emular a presença de novas falhas na API Nova Compute de forma a entender como a bateria de testes unitários, funcionais e de integração do OpenStack cobre essas novas, mas prováveis, situações. Os resultados mostram limitações claras na eficácia da bateria de testes dos programadores do Open Stack, com muitos casos de falhas injetadas a passarem sem serem detectadas por todos os três tipos de testes. Para além disto, observamos que que a maioria dos problemas analisados poderia ser detectada com mudanças ou acréscimos triviais aos testes unitários

Repositório Comum