11,979 research outputs found

    Integrated analysis of error detection and recovery

    Get PDF
    An integrated modeling and analysis of error detection and recovery is presented. When fault latency and/or error latency exist, the system may suffer from multiple faults or error propagations which seriously deteriorate the fault-tolerant capability. Several detection models that enable analysis of the effect of detection mechanisms on the subsequent error handling operations and the overall system reliability were developed. Following detection of the faulty unit and reconfiguration of the system, the contaminated processes or tasks have to be recovered. The strategies of error recovery employed depend on the detection mechanisms and the available redundancy. Several recovery methods including the rollback recovery are considered. The recovery overhead is evaluated as an index of the capabilities of the detection and reconfiguration mechanisms

    Genetic Algorithm Optimization Model for Determining the Probability of Failure on Demand of the Safety Instrumented System

    Get PDF
    A more accurate determination for the Probability of Failure on Demand (PFD) of the Safety Instrumented System (SIS) contributes to more SIS realiability, thereby ensuring more safety and lower cost. IEC 61508 and ISA TR.84.02 provide the PFD detemination formulas. However, these formulas suffer from an uncertaity issue due to the inclusion of uncertainty sources, which, including high redundant systems architectures, cannot be assessed, have perfect proof test assumption, and are neglegted in partial stroke testing (PST) of impact on the system PFD. On the other hand, determining the values of PFD variables to achieve the target risk reduction involves daunting efforts and consumes time. This paper proposes a new approach for system PFD determination and PFD variables optimization that contributes to reduce the uncertainty problem. A higher redundant system can be assessed by generalizing the PFD formula into KooN architecture without neglecting the diagnostic coverage factor (DC) and common cause failures (CCF). In order to simulate the proof test effectiveness, the Proof Test Coverage (PTC) factor has been incorporated into the formula. Additionally, the system PFD value has been improved by incorporating PST for the final control element into the formula. The new developed formula is modelled using the Genetic Algorithm (GA) artificial technique. The GA model saves time and effort to examine system PFD and estimate near optimal values for PFD variables. The proposed model has been applicated on SIS design for crude oil test separator using MATLAB. The comparison between the proposed model and PFD formulas provided by IEC 61508 and ISA TR.84.02 showed that the proposed GA model can assess any system structure and simulate industrial reality. Furthermore, the cost and associated implementation testing activities are reduced

    Optimal discrimination between transient and permanent faults

    Get PDF
    An important practical problem in fault diagnosis is discriminating between permanent faults and transient faults. In many computer systems, the majority of errors are due to transient faults. Many heuristic methods have been used for discriminating between transient and permanent faults; however, we have found no previous work stating this decision problem in clear probabilistic terms. We present an optimal procedure for discriminating between transient and permanent faults, based on applying Bayesian inference to the observed events (correct and erroneous results). We describe how the assessed probability that a module is permanently faulty must vary with observed symptoms. We describe and demonstrate our proposed method on a simple application problem, building the appropriate equations and showing numerical examples. The method can be implemented as a run-time diagnosis algorithm at little computational cost; it can also be used to evaluate any heuristic diagnostic procedure by compariso

    Advanced flight control system study

    Get PDF
    The architecture, requirements, and system elements of an ultrareliable, advanced flight control system are described. The basic criteria are functional reliability of 10 to the minus 10 power/hour of flight and only 6 month scheduled maintenance. A distributed system architecture is described, including a multiplexed communication system, reliable bus controller, the use of skewed sensor arrays, and actuator interfaces. Test bed and flight evaluation program are proposed

    Acceptance Criteria for Critical Software Based on Testability Estimates and Test Results

    Get PDF
    Testability is defined as the probability that a program will fail a test, conditional on the program containing some fault. In this paper, we show that statements about the testability of a program can be more simply described in terms of assumptions on the probability distribution of the failure intensity of the program. We can thus state general acceptance conditions in clear mathematical terms using Bayesian inference. We develop two scenarios, one for software for which the reliability requirements are that the software must be completely fault-free, and another for requirements stated as an upper bound on the acceptable failure probability

    APPLICATION AND REFINEMENTS OF THE REPS THEORY FOR SAFETY CRITICAL SOFTWARE

    Get PDF
    With the replacement of old analog control systems with software-based digital control systems, there is an urgent need for developing a method to quantitatively and accurately assess the reliability of safety critical software systems. This research focuses on proposing a systematic software metric-based reliability prediction method. The method starts with the measurement of a metric. Measurement results are then either directly linked to software defects through inspections and peer reviews or indirectly linked to software defects through empirical software engineering models. Three types of defect characteristics can be obtained, namely, 1) the number of defects remaining, 2) the number and the exact location of the defects found, and 3) the number and the exact location of defects found in an earlier version. Three models, Musa's exponential model, the PIE model and a mixed Musa-PIE model, are then used to link each of the three categories of defect characteristics with reliability respectively. In addition, the use of the PIE model requires mapping defects identified to an Extended Finite State Machine (EFSM) model. A procedure that can assist in the construction of the EFSM model and increase its repeatability is also provided. This metric-based software reliability prediction method is then applied to a safety-critical software used in the nuclear industry using eleven software metrics. Reliability prediction results are compared with the real reliability assessed by using operational failure data. Experiences and lessons learned from the application are discussed. Based on the results and findings, four software metrics are recommended. This dissertation then focuses on one of the four recommended metrics, Test Coverage. A reliability prediction model based on Test Coverage is discussed in detail and this model is further refined to be able to take into consideration more realistic conditions, such as imperfect debugging and the use of multiple testing phases
    corecore