26,557 research outputs found
Design diversity: an update from research on reliability modelling
Diversity between redundant subsystems is, in various forms, a common design approach for improving system dependability. Its value in the case of software-based systems is still controversial. This paper gives an overview of reliability modelling work we carried out in recent projects on design diversity, presented in the context of previous knowledge and practice. These results provide additional insight for decisions in applying diversity and in assessing diverseredundant systems. A general observation is that, just as diversity is a very general design approach, the models of diversity can help conceptual understanding of a range of different situations. We summarise results in the general modelling of common-mode failure, in inference from observed failure data, and in decision-making for diversity in development.
Assessing the Reliability of Diverse Fault-Tolerant Systems
Design diversity between redundant channels is a way of improving the dependability of software-based systems, but it does not alleviate the difficulties of dependability assessment
Recommended from our members
Assessing the reliability of diverse fault-tolerant software-based systems
We discuss a problem in the safety assessment of automatic control and protection systems. There is an increasing dependence on software for performing safety-critical functions, like the safety shut-down of dangerous plants. Software brings increased risk of design defects and thus systematic failures; redundancy with diversity between redundant channels is a possible defence. While diversity techniques can improve the dependability of software-based systems, they do not alleviate the difficulties of assessing whether such a system is safe enough for operation. We study this problem for a simple safety protection system consisting of two diverse channels performing the same function. The problem is evaluating its probability of failure in demand. Assuming failure independence between dangerous failures of the channels is unrealistic. One can instead use evidence from the observation of the whole system's behaviour under realistic test conditions. Standard inference procedures can then estimate system reliability, but they take no advantage of a system’s fault-tolerant structure. We show how to extend these techniques to take account of fault tolerance by a conceptually straightforward application of Bayesian inference. Unfortunately, the method is computationally complex and requires the conceptually difficult step of specifying 'prior' distributions for the parameters of interest. This paper presents the correct inference procedure, exemplifies possible pitfalls in its application and clarifies some non-intuitive issues about reliability assessment for fault-tolerant software
Recommended from our members
Conservative reasoning about epistemic uncertainty for the probability of failure on demand of a 1-out-of-2 software-based system in which one channel is "possibly perfect"
In earlier work, (Littlewood and Rushby 2011) (henceforth LR), an analysis was presented of a 1-out-of-2 system in which one channel was “possibly perfect”. It was shown that, at the aleatory level, the system pfd could be bounded above by the product of the pfd of channel A and the pnp (probability of non-perfection)of channel B. This was presented as a way of avoiding the well-known difficulty that for two certainly-fallible channels, system pfd cannot be expressed simply as a function of the channel pfds, and in particular not as a product of these. One price paid in this new approach is that the result is conservative – perhaps greatly so. Furthermore, a complete analysis requires that account be taken of epistemic uncertainty – here concerning the numeric values of the two parameters pfdA and pnpB. This introduces some difficulties, particularly concerning the estimation of dependence between an assessor’s beliefs about the parameters. The work reported here avoids these difficulties by obtaining results that require only an assessor’s marginal beliefs about the individual channels, i.e. they do not require knowledge of the dependence between these belief
Reasoning About the Reliability of Multi-version, Diverse Real-Time Systems
This paper is concerned with the development of reliable real-time systems for use in high integrity applications. It advocates the use of diverse replicated channels, but does not require the dependencies between the channels to be evaluated. Rather it develops and extends the approach of Little wood and Rush by (for general systems) by investigating a two channel system in which one channel, A, is produced to a high level of reliability (i.e. has a very low failure rate), while the other, B, employs various forms of static analysis to sustain an argument that it is perfect (i.e. it will never miss a deadline). The first channel is fully functional, the second contains a more restricted computational model and contains only the critical computations. Potential dependencies between the channels (and their verification) are evaluated in terms of aleatory and epistemic uncertainty. At the aleatory level the events ''A fails" and ''B is imperfect" are independent. Moreover, unlike the general case, independence at the epistemic level is also proposed for common forms of implementation and analysis for real-time systems and their temporal requirements (deadlines). As a result, a systematic approach is advocated that can be applied in a real engineering context to produce highly reliable real-time systems, and to support numerical claims about the level of reliability achieved
Recommended from our members
Software fault-freeness and reliability predictions
Many software development practices aim at ensuring that software is correct, or fault-free. In safety critical applications, requirements are in terms of probabilities of certain behaviours, e.g. as associated to the Safety Integrity Levels of IEC 61508. The two forms of reasoning - about evidence of correctness and about probabilities of certain failures -are rarely brought together explicitly. The desirability of using claims of correctness has been argued by many authors, but not been taken up in practice. We address how to combine evidence concerning probability of failure together with evidence pertaining to likelihood of fault-freeness, in a Bayesian framework. We present novel results to make this approach practical, by guaranteeing reliability predictions that are conservative (err on the side of pessimism), despite the difficulty of stating prior probability distributions for reliability parameters. This approach seems suitable for practical application to assessment of certain classes of safety critical systems
Recommended from our members
Conservative reasoning about epistemic uncertainty for the probability of failure on demand of a 1-out-of-2 software-based system in which one channel is “possibly perfect”
In earlier work, (Littlewood and Rushby 2012) (henceforth LR), an analysis was presented of a 1-out-of-2 software-based system in which one channel was “possibly perfect”. It was shown that, at the aleatory level, the system pfd (probability of failure on demand) could be bounded above by the product of the pfd of channel A and the pnp (probability of non-perfection) of channel B. This result was presented as a way of avoiding the well-known difficulty that for two certainly-fallible channels, failures of the two will be dependent, i.e. the system pfd cannot be expressed simply as a product of the channel pfds. A price paid in this new approach for avoiding the issue of failure dependence is that the result is conservative. Furthermore, a complete analysis requires that account be taken of epistemic uncertainty – here concerning the numeric values of the two parameters pfdA and pnpB. Unfortunately this introduces a different difficult problem of dependence: estimating the dependence between an assessor’s beliefs about the parameters. The work reported here avoids this problem by obtaining results that require only an assessor’s marginal beliefs about the individual channels, i.e. they do not require knowledge of the dependence between these beliefs. The price paid is further conservatism in the results
Recommended from our members
The effect of testing on reliability of fault-tolerant software
Previous models have investigated the impact upondiversity - and hence upon the reliability of fault-tolerantsoftware built from 'diverse' versions - of the variation in'difficulty' of demands over the demand space. Thesemodels are essentially static, taking a single snapshotview of the system. In this paper we consider ageneralisation in which the individual versions areallowed to evolve - and their reliability to grow - throughdebugging. In particular, we examine the trade-off thatoccurs in testing between, on the one hand, the increasingreliability of individual versions, and on the other handthe possible diminution of diversity
Recommended from our members
Assessing Asymmetric Fault-Tolerant Software
The most popular forms of fault tolerance against design faults use "asymmetric" architectures in which a "primary" part performs the computation and a "secondary" part is in charge of detecting errors and performing some kind of error processing and recovery. In contrast, the most studied forms of software fault tolerance are "symmetric" ones, e.g. N-version programming. The latter are often controversial, the former are not. We discuss how to assess the dependability gains achieved by these methods. Substantial difficulties have been shown to exist for symmetric schemes, but we show that the same difficulties affect asymmetric schemes. Indeed, the latter present somewhat subtler problems. In both cases, to predict the dependability of the fault-tolerant system it is not enough to know the dependability of the individual components. We extend to asymmetric architectures the style of probabilistic modeling that has been useful for describing the dependability of "symmetric" architectures, to highlight factors that complicate the assessment. In the light of these models, we finally discuss fault injection approaches to estimating coverage factors. We highlight the limits of what can be predicted and some useful research directions towards clarifying and extending the range of situations in which estimates of coverage of fault tolerance mechanisms can be trusted
- …