6 research outputs found
Beyond the golden run : evaluating the use of reference run models in fault injection analysis
Fault injection (FI) has been shown to be an effective approach to assess- ing the dependability of software systems. To determine the impact of faults injected during FI, a given oracle is needed. This oracle can take a variety of forms, however prominent oracles include (i) specifications, (ii) error detection mechanisms and (iii) golden runs. Focusing on golden runs, in this paper we show that there are classes of software which a golden run based approach can not be used to analyse. Specifically we demonstrate that a golden run based approach can not be used when analysing systems which employ a main control loop with an irregular period. Further, we show how a simple model, which has been refined using FI, can be employed as an oracle in the analysis of such a system
Enhancing Failure Propagation Analysis in Cloud Computing Systems
In order to plan for failure recovery, the designers of cloud systems need to
understand how their system can potentially fail. Unfortunately, analyzing the
failure behavior of such systems can be very difficult and time-consuming, due
to the large volume of events, non-determinism, and reuse of third-party
components. To address these issues, we propose a novel approach that joins
fault injection with anomaly detection to identify the symptoms of failures. We
evaluated the proposed approach in the context of the OpenStack cloud computing
platform. We show that our model can significantly improve the accuracy of
failure analysis in terms of false positives and negatives, with a low
computational cost.Comment: 12 pages, The 30th International Symposium on Software Reliability
Engineering (ISSRE 2019
Fault Injection Analytics: A Novel Approach to Discover Failure Modes in Cloud-Computing Systems
Cloud computing systems fail in complex and unexpected ways due to unexpected
combinations of events and interactions between hardware and software
components. Fault injection is an effective means to bring out these failures
in a controlled environment. However, fault injection experiments produce
massive amounts of data, and manually analyzing these data is inefficient and
error-prone, as the analyst can miss severe failure modes that are yet unknown.
This paper introduces a new paradigm (fault injection analytics) that applies
unsupervised machine learning on execution traces of the injected system, to
ease the discovery and interpretation of failure modes. We evaluated the
proposed approach in the context of fault injection experiments on the
OpenStack cloud computing platform, where we show that the approach can
accurately identify failure modes with a low computational cost.Comment: IEEE Transactions on Dependable and Secure Computing; 16 pages. arXiv
admin note: text overlap with arXiv:1908.1164
Towards the design of efficient error detection mechanisms
The pervasive nature of modern computer systems has led to an increase in our
reliance on such systems to provide correct and timely services. Moreover, as
the functionality of computer systems is being increasingly defined in software,
it is imperative that software be dependable. It has previously been shown that
a fault intolerant software system can be made fault tolerant through the design
and deployment of software mechanisms implementing abstract artefacts known
as error detection mechanisms (EDMs) and error recovery mechanisms (ERMs),
hence the design of these components is central to the design of dependable
software systems. The EDM design problem, which relates to the construction
of a boolean predicate over a set of program variables, is inherently difficult,
with current approaches relying on system specifications and the experience of
software engineers. As this process necessarily entails the identification and
incorporation of program variables by an error detection predicate, this thesis
seeks to address the EDM design problem from a novel variable-centric perspective,
with the research presented supporting the thesis that, where it exists
under the assumed system model, an efficient EDM consists of a set of critical
variables. In particular, this research proposes (i) a metric suite that can
be used to generate a relative ranking of the program variables in a software with respect to their criticality, (ii) a systematic approach for the generation
of highly-efficient error detection predicates for EDMs, and (iii) an approach
for dependability enhancement based on the protection of critical variables using
software wrappers that implement error detection and correction predicates
that are known to be efficient. This research substantiates the thesis that an
efficient EDM contains a set of critical variables on the basis that (i) the proposed
metric suite is able, through application of an appropriate threshold, to
identify critical variables, (ii) efficient EDMs can be constructed based only on
the critical variables identified by the metric suite, and (iii) the criticality of the
identified variables can be shown to extend across a software module such that
an efficient EDM designed for that software module should seek to determine
the correctness of the identified variables
From Safety Analysis to Experimental Validation by Fault Injection—Case of Automotive Embedded Systems
En raison de la complexité croissante des systèmes automobiles embarqués, la sûreté de fonctionnement est devenue un enjeu majeur de l’industrie automobile. Cet intérêt croissant s’est traduit par la sortie en 2011 de la norme ISO 26262 sur la sécurité fonctionnelle. Les défis auxquelles sont confrontés les acteurs du domaine sont donc les suivants : d’une part, la conception de systèmes sûrs, et d’autre part, la conformité aux exigences de la norme ISO 26262. Notre approche se base sur l’application systématique de l’injection de fautes pour la vérification et la validation des exigences de sécurité, tout au long du cycle de développement, des phases de conception jusqu’à l’implémentation. L’injection de fautes nous permet en particulier de vérifier que les mécanismes de tolérance aux fautes sont efficaces et que les exigences non-fonctionnelles sont respectées. L’injection de faute est une technique de vérification très ancienne. Cependant, son rôle lors de la phase de conception et ses complémentarités avec la validation expérimentale, méritent d’être étudiés. Notre approche s’appuie sur l’application du modèle FARM (Fautes, Activations, Relevés et Mesures) tout au long du processus de développement. Les analyses de sûreté sont le point de départ de notre approche, avec l'identification des mécanismes de tolérance aux fautes et des exigences non-fonctionnelles, et se terminent par la validation de ces mécanismes par les expériences classiques d'injection de fautes. Enfin, nous montrons que notre approche peut être intégrée dans le processus de développement des systèmes embarqués automobiles décrits dans la norme ISO 26262. Les contributions de la thèse sont illustrées sur l’étude de cas d’un système d’éclairage avant d’une automobile. ABSTRACT :
Due to the rising complexity of automotive Electric/Electronic embedded systems, Functional Safety becomes a main issue in the automotive industry. This issue has been formalized by the introduction of the ISO 26262 standard for functional safety in 2011. The challenges are, on the one hand to design safe systems based on a systematic verification and validation approach, and on the other hand, the fulfilment of the requirements of the ISO 26262 standard. Following ISO 26262 recommendations, our approach, based on fault injection, aims at verifying fault tolerance mechanisms and non-functional requirements at all steps of the development cycle, from early design phases down to implementation.
Fault injection is a verification technique that has been investigated for a long time. However, the role of fault injection during design phase and its complementarities with the experimental validation of the target have not been explored. In this work, we investigate a fault injection continuum, from system design validation to experiments on implemented targets. The proposed approach considers the safety analyses as a starting point, with the identification of safety mechanisms and safety requirements, and goes down to the validation of the implementation of safety mechanisms through fault injection experiments. The whole approach is based on a key fault injection framework, called FARM (Fault, Activation, Readouts and Measures). We show that this approach can be integrated in the development process of the automotive embedded systems described in the ISO 26262 standard. Our approach is illustrated on an automotive case study: a Front-Light system
Evaluating the use of reference run models in fault injection analysis
Fault injection (FI) has been shown to be an effective approach to assessing the dependability of software systems. To determine the impact of faults injected during FI, a given oracle is needed. Oracles can take a variety of forms, including (i) specifications, (ii) error detection mechanisms and (iii) golden runs. Focusing on golden runs, in this paper we show that there are classes of software which a golden run based approach can not be used to analyse. Specifically, we demonstrate that a golden run based approach can not be used in the analysis of systems which employ a main control loop with an irregular period. Further, we show how a simple model, which has been refined using FI experiments, can be employed as an oracle in the analysis of such a system