771 research outputs found
A methodology for the generation of efficient error detection mechanisms
A dependable software system must contain error detection mechanisms and error recovery mechanisms. Software components for the detection of errors are typically designed based on a system specification or the experience of software engineers, with their efficiency typically being measured using fault injection and metrics such as coverage and latency. In this paper, we introduce a methodology for the design of highly efficient error detection mechanisms. The proposed methodology combines fault injection analysis and data mining techniques in order to generate predicates for efficient error detection mechanisms. The results presented demonstrate the viability of the methodology as an approach for the development of efficient error detection mechanisms, as the predicates generated yield a true positive rate of almost 100% and a false positive rate very close to 0% for the detection of failure-inducing states. The main advantage of the proposed methodology over current state-of-the-art approaches is that efficient detectors are obtained by design, rather than by using specification-based detector design or the experience of software engineers
Beyond the golden run : evaluating the use of reference run models in fault injection analysis
Fault injection (FI) has been shown to be an effective approach to assess- ing the dependability of software systems. To determine the impact of faults injected during FI, a given oracle is needed. This oracle can take a variety of forms, however prominent oracles include (i) specifications, (ii) error detection mechanisms and (iii) golden runs. Focusing on golden runs, in this paper we show that there are classes of software which a golden run based approach can not be used to analyse. Specifically we demonstrate that a golden run based approach can not be used when analysing systems which employ a main control loop with an irregular period. Further, we show how a simple model, which has been refined using FI, can be employed as an oracle in the analysis of such a system
Integration of formal fault analysis in ASSERT: Case studies and lessons learnt
International audienceThe ASSERT European Integrated Project (Automated proof-based System and Software Engineering for Real-Time systems; EC FP6, IST-004033) has investigated, elaborated and experimented advanced methods based on the AltaRica language and support tool OCAS for architecture and fault approach propagation description analysis, and integrated in the complete ASSERT process. The paper describes lessons learnt from three case studies: safety critical spacecraft, autonomous deep exploration spacecraft, and civil aircraft
Recommended from our members
Benchmarking tests on recovery oriented computing
textBenchmarks have played a very important role in guiding the progress of computer
science systems in various ways. Specifically, in Autonomous environments it has a
major role to play. System crashes and software failures are a basic part of a software
system’s life-cycle and to overcome or rather make it as less vulnerable as possible is the
main purpose of recovery oriented computing. This is usually done by trying to reduce
the downtime by automatically and efficiently recovering from a broad class of transient
software failures without having to modify applications. There have been various types of
benchmarks for recovering from a failure, but in this paper we intend to create a
benchmark framework called the warning benchmarks to measure and evaluate the
recovery oriented systems. It consists of the known and the unknown failures and few
benchmark techniques which the warning benchmarks handle with the help of various
other techniques in software fault analysis.Electrical and Computer Engineerin
An Experimental Evaluation of the REE SIFT Environment for Spaceborne Applications
Coordinated Science Laboratory was formerly known as Control Systems Laborator
An Adaptive Resilience Testing Framework for Microservice Systems
Resilience testing, which measures the ability to minimize service
degradation caused by unexpected failures, is crucial for microservice systems.
The current practice for resilience testing relies on manually defining rules
for different microservice systems. Due to the diverse business logic of
microservices, there are no one-size-fits-all microservice resilience testing
rules. As the quantity and dynamic of microservices and failures largely
increase, manual configuration exhibits its scalability and adaptivity issues.
To overcome the two issues, we empirically compare the impacts of common
failures in the resilient and unresilient deployments of a benchmark
microservice system. Our study demonstrates that the resilient deployment can
block the propagation of degradation from system performance metrics (e.g.,
memory usage) to business metrics (e.g., response latency). In this paper, we
propose AVERT, the first AdaptiVE Resilience Testing framework for microservice
systems. AVERT first injects failures into microservices and collects available
monitoring metrics. Then AVERT ranks all the monitoring metrics according to
their contributions to the overall service degradation caused by the injected
failures. Lastly, AVERT produces a resilience index by how much the degradation
in system performance metrics propagates to the degradation in business
metrics. The higher the degradation propagation, the lower the resilience of
the microservice system. We evaluate AVERT on two open-source benchmark
microservice systems. The experimental results show that AVERT can accurately
and efficiently test the resilience of microservice systems
- …