11,979 research outputs found
Recommended from our members
On the use of testability measures for dependability assessment
Program “testability” is informally, the probability that a program will fail under test if it contains at least one fault. When a dependability assessment has to be derived from the observation of a series of failure free test executions (a common need for software subject to “ultra high reliability” requirements), measures of testability can-in theory-be used to draw inferences on program correctness. We rigorously investigate the concept of testability and its use in dependability assessment, criticizing, and improving on, previously published results. We give a general descriptive model of program execution and testing, on which the different measures of interest can be defined. We propose a more precise definition of program testability than that given by other authors, and discuss how to increase testing effectiveness without impairing program reliability in operation. We then study the mathematics of using testability to estimate, from test results: the probability of program correctness and the probability of failures. To derive the probability of program correctness, we use a Bayesian inference procedure and argue that this is more useful than deriving a classical “confidence level”. We also show that a high testability is not an unconditionally desirable property for a program. In particular, for programs complex enough that they are unlikely to be completely fault free, increasing testability may produce a program which will be less trustworthy, even after successful testin
Integrated analysis of error detection and recovery
An integrated modeling and analysis of error detection and recovery is presented. When fault latency and/or error latency exist, the system may suffer from multiple faults or error propagations which seriously deteriorate the fault-tolerant capability. Several detection models that enable analysis of the effect of detection mechanisms on the subsequent error handling operations and the overall system reliability were developed. Following detection of the faulty unit and reconfiguration of the system, the contaminated processes or tasks have to be recovered. The strategies of error recovery employed depend on the detection mechanisms and the available redundancy. Several recovery methods including the rollback recovery are considered. The recovery overhead is evaluated as an index of the capabilities of the detection and reconfiguration mechanisms
Genetic Algorithm Optimization Model for Determining the Probability of Failure on Demand of the Safety Instrumented System
A more accurate determination for the Probability of Failure on Demand (PFD) of the Safety Instrumented System (SIS) contributes to more SIS realiability, thereby ensuring more safety and lower cost. IEC 61508 and ISA TR.84.02 provide the PFD detemination formulas. However, these formulas suffer from an uncertaity issue due to the inclusion of uncertainty sources, which, including high redundant systems architectures, cannot be assessed, have perfect proof test assumption, and are neglegted in partial stroke testing (PST) of impact on the system PFD. On the other hand, determining the values of PFD variables to achieve the target risk reduction involves daunting efforts and consumes time. This paper proposes a new approach for system PFD determination and PFD variables optimization that contributes to reduce the uncertainty problem. A higher redundant system can be assessed by generalizing the PFD formula into KooN architecture without neglecting the diagnostic coverage factor (DC) and common cause failures (CCF). In order to simulate the proof test effectiveness, the Proof Test Coverage (PTC) factor has been incorporated into the formula. Additionally, the system PFD value has been improved by incorporating PST for the final control element into the formula. The new developed formula is modelled using the Genetic Algorithm (GA) artificial technique. The GA model saves time and effort to examine system PFD and estimate near optimal values for PFD variables. The proposed model has been applicated on SIS design for crude oil test separator using MATLAB. The comparison between the proposed model and PFD formulas provided by IEC 61508 and ISA TR.84.02 showed that the proposed GA model can assess any system structure and simulate industrial reality. Furthermore, the cost and associated implementation testing activities are reduced
Optimal discrimination between transient and permanent faults
An important practical problem in fault diagnosis is discriminating between permanent faults and transient faults. In many computer systems, the majority of errors are due to transient faults. Many heuristic methods have been used for discriminating between transient and permanent faults; however, we have found no previous work stating this decision problem in clear probabilistic terms. We present an optimal procedure for discriminating between transient and permanent faults, based on applying Bayesian inference to the observed events (correct and erroneous results). We describe how the assessed probability that a module is permanently faulty must vary with observed symptoms. We describe and demonstrate our proposed method on a simple application problem, building the appropriate equations and showing numerical examples. The method can be implemented as a run-time diagnosis algorithm at little computational cost; it can also be used to evaluate any heuristic diagnostic procedure by compariso
Recommended from our members
Assessing Asymmetric Fault-Tolerant Software
The most popular forms of fault tolerance against design faults use "asymmetric" architectures in which a "primary" part performs the computation and a "secondary" part is in charge of detecting errors and performing some kind of error processing and recovery. In contrast, the most studied forms of software fault tolerance are "symmetric" ones, e.g. N-version programming. The latter are often controversial, the former are not. We discuss how to assess the dependability gains achieved by these methods. Substantial difficulties have been shown to exist for symmetric schemes, but we show that the same difficulties affect asymmetric schemes. Indeed, the latter present somewhat subtler problems. In both cases, to predict the dependability of the fault-tolerant system it is not enough to know the dependability of the individual components. We extend to asymmetric architectures the style of probabilistic modeling that has been useful for describing the dependability of "symmetric" architectures, to highlight factors that complicate the assessment. In the light of these models, we finally discuss fault injection approaches to estimating coverage factors. We highlight the limits of what can be predicted and some useful research directions towards clarifying and extending the range of situations in which estimates of coverage of fault tolerance mechanisms can be trusted
Advanced flight control system study
The architecture, requirements, and system elements of an ultrareliable, advanced flight control system are described. The basic criteria are functional reliability of 10 to the minus 10 power/hour of flight and only 6 month scheduled maintenance. A distributed system architecture is described, including a multiplexed communication system, reliable bus controller, the use of skewed sensor arrays, and actuator interfaces. Test bed and flight evaluation program are proposed
Acceptance Criteria for Critical Software Based on Testability Estimates and Test Results
Testability is defined as the probability that a program will fail a test, conditional on the program containing some fault. In this paper, we show that statements about the testability of a program can be more simply described in terms of assumptions on the probability distribution of the failure intensity of the program. We can thus state general acceptance conditions in clear mathematical terms using Bayesian inference. We develop two scenarios, one for software for which the reliability requirements are that the software must be completely fault-free, and another for requirements stated as an upper bound on the acceptable failure probability
APPLICATION AND REFINEMENTS OF THE REPS THEORY FOR SAFETY CRITICAL SOFTWARE
With the replacement of old analog control systems with software-based digital control systems, there is an urgent need for developing a method to quantitatively and accurately assess the reliability of safety critical software systems. This research focuses on proposing a systematic software metric-based reliability prediction method. The method starts with the measurement of a metric. Measurement results are then either directly linked to software defects through inspections and peer reviews or indirectly linked to software defects through empirical software engineering models. Three types of defect characteristics can be obtained, namely, 1) the number of defects remaining, 2) the number and the exact location of the defects found, and 3) the number and the exact location of defects found in an earlier version. Three models, Musa's exponential model, the PIE model and a mixed Musa-PIE model, are then used to link each of the three categories of defect characteristics with reliability respectively. In addition, the use of the PIE model requires mapping defects identified to an Extended Finite State Machine (EFSM) model. A procedure that can assist in the construction of the EFSM model and increase its repeatability is also provided.
This metric-based software reliability prediction method is then applied to a safety-critical software used in the nuclear industry using eleven software metrics. Reliability prediction results are compared with the real reliability assessed by using operational failure data. Experiences and lessons learned from the application are discussed. Based on the results and findings, four software metrics are recommended.
This dissertation then focuses on one of the four recommended metrics, Test Coverage. A reliability prediction model based on Test Coverage is discussed in detail and this model is further refined to be able to take into consideration more realistic conditions, such as imperfect debugging and the use of multiple testing phases
- …