Search CORE

7,799 research outputs found

Experimental analysis of computer system dependability

Author: Iyer Ravishankar, K.
Tang Dong
Publication venue
Publication date
Field of study

This paper reviews an area which has evolved over the past 15 years: experimental analysis of computer system dependability. Methodologies and advances are discussed for three basic approaches used in the area: simulated fault injection, physical fault injection, and measurement-based analysis. The three approaches are suited, respectively, to dependability evaluation in the three phases of a system's life: design phase, prototype phase, and operational phase. Before the discussion of these phases, several statistical techniques used in the area are introduced. For each phase, a classification of research methods or study topics is outlined, followed by discussion of these methods or topics as well as representative studies. The statistical techniques introduced include the estimation of parameters and confidence intervals, probability distribution characterization, and several multivariate analysis methods. Importance sampling, a statistical technique used to accelerate Monte Carlo simulation, is also introduced. The discussion of simulated fault injection covers electrical-level, logic-level, and function-level fault injection methods as well as representative simulation environments such as FOCUS and DEPEND. The discussion of physical fault injection covers hardware, software, and radiation fault injection methods as well as several software and hybrid tools including FIAT, FERARI, HYBRID, and FINE. The discussion of measurement-based analysis covers measurement and data processing techniques, basic error characterization, dependency analysis, Markov reward modeling, software-dependability, and fault diagnosis. The discussion involves several important issues studies in the area, including fault models, fast simulation techniques, workload/failure dependency, correlated failures, and software fault tolerance

NASA Technical Reports Server

Software dependability modeling using an industry-standard architecture description language

Author: Feiler Peter H.
Kaaniche Mohamed
Kanoun Karama
Rugina Ana-Elena
Publication venue
Publication date: 01/01/2008
Field of study

Performing dependability evaluation along with other analyses at architectural level allows both making architectural tradeoffs and predicting the effects of architectural decisions on the dependability of an application. This paper gives guidelines for building architectural dependability models for software systems using the AADL (Architecture Analysis and Design Language). It presents reusable modeling patterns for fault-tolerant applications and shows how the presented patterns can be used in the context of a subsystem of a real-life application

arXiv.org e-Print Archive

CiteSeerX

Scientific Publications of the University of Toulouse II Le Mirail

HAL-INSA Toulouse

Recommended from our members

Evaluation of software dependability

Author: Littlewood B.
Publication venue: 'Cambridge University Press (CUP)'
Publication date: 01/01/1996
Field of study

It has been said that the term software engineering is an aspiration not a description. We would like to be able to claim that we engineer software, in the same sense that we engineer an aero-engine, but most of us would agree that this is not currently an accurate description of our activities. My suspicion is that it never will be. From the point of view of this essay – i.e. dependability evaluation – a major difference between software and other engineering artefacts is that the former is pure design. Its unreliability is always the result of design faults, which in turn arise as a result of human intellectual failures. The unreliability of hardware systems, on the other hand, has tended until recently to be dominated by random physical failures of components – the consequences of the ‘perversity of nature’. Reliability theories have been developed over the years which have successfully allowed systems to be built to high reliability requirements, and the final system reliability to be evaluated accurately. Even for pure hardware systems, without software, however, the very success of these theories has more recently highlighted the importance of design faults in determining the overall reliability of the final product. The conventional hardware reliability theory does not address this problem at all. In the case of software, there is no physical source of failures, and so none of the reliability theory developed for hardware is relevant. We need new theories that will allow us to achieve required dependability levels, and to evaluate the actual dependability that has been achieved, when the sources of the faults that ultimately result in failure are human intellectual failures

City Research Online

Recommended from our members

Assessing Asymmetric Fault-Tolerant Software

Author: Popov P. T.
Strigini L.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/11/2010
Field of study

The most popular forms of fault tolerance against design faults use "asymmetric" architectures in which a "primary" part performs the computation and a "secondary" part is in charge of detecting errors and performing some kind of error processing and recovery. In contrast, the most studied forms of software fault tolerance are "symmetric" ones, e.g. N-version programming. The latter are often controversial, the former are not. We discuss how to assess the dependability gains achieved by these methods. Substantial difficulties have been shown to exist for symmetric schemes, but we show that the same difficulties affect asymmetric schemes. Indeed, the latter present somewhat subtler problems. In both cases, to predict the dependability of the fault-tolerant system it is not enough to know the dependability of the individual components. We extend to asymmetric architectures the style of probabilistic modeling that has been useful for describing the dependability of "symmetric" architectures, to highlight factors that complicate the assessment. In the light of these models, we finally discuss fault injection approaches to estimating coverage factors. We highlight the limits of what can be predicted and some useful research directions towards clarifying and extending the range of situations in which estimates of coverage of fault tolerance mechanisms can be trusted

City Research Online

Crossref

Recommended from our members

Fault Tolerance Against Design Faults

Author: Strigini L.
Publication venue: 'Wiley'
Publication date: 01/01/2005
Field of study

City Research Online

The Art of Fault Injection

Author: Benso Alfredo
Di Carlo Stefano
Publication venue: SRAIT
Publication date: 01/01/2011
Field of study

Classical greek philosopher considered the foremost virtues to be temperance, justice, courage, and prudence. In this paper we relate these cardinal virtues to the correct methodological approaches that researchers should follow when setting up a fault injection experiment. With this work we try to understand where the "straightforward pathway" lies, in order to highlight those common methodological errors that deeply influence the coherency and the meaningfulness of fault injection experiments. Fault injection is like an art, where the success of the experiments depends on a very delicate balance between modeling, creativity, statistics, and patience

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

PORTO Publications Open Repository TOrino

A synthesis of logic and bio-inspired techniques in the design of dependable systems

Author: Azevedo Luis
Bottaci Leonardo
Kabir Sohag
Papadopoulos Yiannis
Parker David
Sharvia Septavera
Sorokos Ioannis
Walker Martin
Publication venue: 'Elsevier BV'
Publication date: 04/05/2016
Field of study

Much of the development of model-based design and dependability analysis in the design of dependable systems, including software intensive systems, can be attributed to the application of advances in formal logic and its application to fault forecasting and verification of systems. In parallel, work on bio-inspired technologies has shown potential for the evolutionary design of engineering systems via automated exploration of potentially large design spaces. We have not yet seen the emergence of a design paradigm that effectively combines these two techniques, schematically founded on the two pillars of formal logic and biology, from the early stages of, and throughout, the design lifecycle. Such a design paradigm would apply these techniques synergistically and systematically to enable optimal refinement of new designs which can be driven effectively by dependability requirements. The paper sketches such a model-centric paradigm for the design of dependable systems, presented in the scope of the HiP-HOPS tool and technique, that brings these technologies together to realise their combined potential benefits. The paper begins by identifying current challenges in model-based safety assessment and then overviews the use of meta-heuristics at various stages of the design lifecycle covering topics that span from allocation of dependability requirements, through dependability analysis, to multi-objective optimisation of system architectures and maintenance schedules

Repository@Hull - Worktribe

Aspect-oriented technology for dependable operating systems

Author: Borchert Christoph
Publication venue
Publication date: 01/01/2017
Field of study

Modern computer devices exhibit transient hardware faults that disturb the electrical behavior but do not cause permanent physical damage to the devices. Transient faults are caused by a multitude of sources, such as fluctuation of the supply voltage, electromagnetic interference, and radiation from the natural environment. Therefore, dependable computer systems must incorporate methods of fault tolerance to cope with transient faults. Software-implemented fault tolerance represents a promising approach that does not need expensive hardware redundancy for reducing the probability of failure to an acceptable level. This thesis focuses on software-implemented fault tolerance for operating systems because they are the most critical pieces of software in a computer system: All computer programs depend on the integrity of the operating system. However, the C/C++ source code of common operating systems tends to be already exceedingly complex, so that a manual extension by fault tolerance is no viable solution. Thus, this thesis proposes a generic solution based on Aspect-Oriented Programming (AOP). To evaluate AOP as a means to improve the dependability of operating systems, this thesis presents the design and implementation of a library of aspect-oriented fault-tolerance mechanisms. These mechanisms constitute separate program modules that can be integrated automatically into common off-the-shelf operating systems using a compiler for the AOP language. Thus, the aspect-oriented approach facilitates improving the dependability of large-scale software systems without affecting the maintainability of the source code. The library allows choosing between several error-detection and error-correction schemes, and provides wait-free synchronization for handling asynchronous and multi-threaded operating-system code. This thesis evaluates the aspect-oriented approach to fault tolerance on the basis of two off-the-shelf operating systems. Furthermore, the evaluation also considers one user-level program for protection, as the library of fault-tolerance mechanisms is highly generic and transparent and, thus, not limited to operating systems. Exhaustive fault-injection experiments show an excellent trade-off between runtime overhead and fault tolerance, which can be adjusted and optimized by fine-grained selective placement of the fault-tolerance mechanisms. Finally, this thesis provides evidence for the effectiveness of the approach in detecting and correcting radiation-induced hardware faults: High-energy particle radiation experiments confirm improvements in fault tolerance by almost 80 percent

Eldorado - Ressourcen aus und für Lehre, Studium und Forschung

Software dependability in the Tandem GUARDIAN system

Author: Iyer Ravishankar K.
Lee Inhwan
Publication venue
Publication date: 01/01/1995
Field of study

Based on extensive field failure data for Tandem's GUARDIAN operating system this paper discusses evaluation of the dependability of operational software. Software faults considered are major defects that result in processor failures and invoke backup processes to take over. The paper categorizes the underlying causes of software failures and evaluates the effectiveness of the process pair technique in tolerating software faults. A model to describe the impact of software faults on the reliability of an overall system is proposed. The model is used to evaluate the significance of key factors that determine software dependability and to identify areas for improvement. An analysis of the data shows that about 77% of processor failures that are initially considered due to software are confirmed as software problems. The analysis shows that the use of process pairs to provide checkpointing and restart (originally intended for tolerating hardware faults) allows the system to tolerate about 75% of reported software faults that result in processor failures. The loose coupling between processors, which results in the backup execution (the processor state and the sequence of events) being different from the original execution, is a major reason for the measured software fault tolerance. Over two-thirds (72%) of measured software failures are recurrences of previously reported faults. Modeling, based on the data, shows that, in addition to reducing the number of software faults, software dependability can be enhanced by reducing the recurrence rate

NASA Technical Reports Server