270 research outputs found
Recommended from our members
Assessing Asymmetric Fault-Tolerant Software
The most popular forms of fault tolerance against design faults use "asymmetric" architectures in which a "primary" part performs the computation and a "secondary" part is in charge of detecting errors and performing some kind of error processing and recovery. In contrast, the most studied forms of software fault tolerance are "symmetric" ones, e.g. N-version programming. The latter are often controversial, the former are not. We discuss how to assess the dependability gains achieved by these methods. Substantial difficulties have been shown to exist for symmetric schemes, but we show that the same difficulties affect asymmetric schemes. Indeed, the latter present somewhat subtler problems. In both cases, to predict the dependability of the fault-tolerant system it is not enough to know the dependability of the individual components. We extend to asymmetric architectures the style of probabilistic modeling that has been useful for describing the dependability of "symmetric" architectures, to highlight factors that complicate the assessment. In the light of these models, we finally discuss fault injection approaches to estimating coverage factors. We highlight the limits of what can be predicted and some useful research directions towards clarifying and extending the range of situations in which estimates of coverage of fault tolerance mechanisms can be trusted
Recommended from our members
An Empirical Study of the Effectiveness of 'Forcing Diversity' Based on a Large Population of Diverse Programs
Use of diverse software components is a viable defence against common-mode failures in redundant softwarebased systems. Various forms of "Diversity-Seeking Decisions" (“DSDs”) can be applied to the process of developing, or procuring, redundant components, to improve the chances of the resulting components not failing on the same demands. An open question is how effective these decisions, and their combinations, are for achieving large enough reliability gains. Using a large population of software programs, we studied experimentally the effectiveness of specific "DSDs" (and their combinations) mandating differences between redundant components. Some of these combinations produced much better improvements in system probability of failure per demand (PFD) than "uncontrolled" diversity did. Yet, our findings suggest that the gains from such "DSDs" vary significantly between them and between the application problems studied. The relationship between DSDs and system PFD is complex and does not allow for simple universal rules
(e.g. "the more diversity the better") to apply
Recommended from our members
Assessing the reliability of diverse fault-tolerant software-based systems
We discuss a problem in the safety assessment of automatic control and protection systems. There is an increasing dependence on software for performing safety-critical functions, like the safety shut-down of dangerous plants. Software brings increased risk of design defects and thus systematic failures; redundancy with diversity between redundant channels is a possible defence. While diversity techniques can improve the dependability of software-based systems, they do not alleviate the difficulties of assessing whether such a system is safe enough for operation. We study this problem for a simple safety protection system consisting of two diverse channels performing the same function. The problem is evaluating its probability of failure in demand. Assuming failure independence between dangerous failures of the channels is unrealistic. One can instead use evidence from the observation of the whole system's behaviour under realistic test conditions. Standard inference procedures can then estimate system reliability, but they take no advantage of a system’s fault-tolerant structure. We show how to extend these techniques to take account of fault tolerance by a conceptually straightforward application of Bayesian inference. Unfortunately, the method is computationally complex and requires the conceptually difficult step of specifying 'prior' distributions for the parameters of interest. This paper presents the correct inference procedure, exemplifies possible pitfalls in its application and clarifies some non-intuitive issues about reliability assessment for fault-tolerant software
Recommended from our members
A note on reliability estimation of functionally diverse systems
It has been argued that functional diversity might be a plausible means of claiming independence of failures between two versions of a system. We present a model of functional diversity, in the spirit of earlier models of diversity such as those of Eckhardt and Lee, and Hughes. In terms of the model, we show that the claims for independence between functionally diverse systems seem rather unrealistic. Instead, it seems likely that functionally diverse systems will exhibit positively correlated failures, and thus will be less reliable than an assumption of independence would suggest. The result does not, of course, suggest that functional diversity is not worthwhile; instead, it places upon the evaluator of such a system the onus to estimate the degree of dependence so as to evaluate the reliability of the system
Choosing effective methods for design diversity - How to progress from intuition to science
Design diversity is a popular defence against design faults in safety critical systems. Design diversity is at times pursued by simply isolating the development teams of the different versions, but it is presumably better to "force" diversity, by appropriate prescriptions to the teams. There are many ways of forcing diversity. Yet, managers who have to choose a cost-effective combination of these have little guidance except their own intuition. We argue the need for more scientifically based recommendations, and outline the problems with producing them. We focus on what we think is the standard basis for most recommendations: the belief that, in order to produce failure diversity among versions, project decisions should aim at causing "diversity" among the faults in the versions. We attempt to clarify what these beliefs mean, in which cases they may be justified and how they can be checked or disproved experimentally
Assessing the Reliability of Diverse Fault-Tolerant Systems
Design diversity between redundant channels is a way of improving the dependability of software-based systems, but it does not alleviate the difficulties of dependability assessment
Recommended from our members
Fault tolerance via diversity for off-the-shelf products: A study with SQL database servers
If an off-the-shelf software product exhibits poor dependability due to design faults, then software fault tolerance is often the only way available to users and system integrators to alleviate the problem. Thanks to low acquisition costs, even using multiple versions of software in a parallel architecture, which is a scheme formerly reserved for few and highly critical applications, may become viable for many applications. We have studied the potential dependability gains from these solutions for off-the-shelf database servers. We based the study on the bug reports available for four off-the-shelf SQL servers plus later releases of two of them. We found that many of these faults cause systematic noncrash failures, which is a category ignored by most studies and standard implementations of fault tolerance for databases. Our observations suggest that diverse redundancy would be effective for tolerating design faults in this category of products. Only in very few cases would demands that triggered a bug in one server cause failures in another one, and there were no coincident failures in more than two of the servers. Use of different releases of the same product would also tolerate a significant fraction of the faults. We report our results and discuss their implications, the architectural options available for exploiting them, and the difficulties that they may present
Recommended from our members
Fault diversity among off-the-shelf SQL database servers
Fault tolerance is often the only viable way of obtaining the required system dependability from systems built out of "off-the-shelf" (OTS) products. We have studied a sample of bug reports from four off-the-shelf SQL servers so as to estimate the possible advantages of software fault tolerance - in the form of modular redundancy with diversity - in complex off-the-shelf software. We checked whether these bugs would cause coincident failures in more than one of the servers. We found that very few bugs affected two of the four servers, and none caused failures in more than two. We also found that only four of these bugs would cause identical, undetectable failures in two servers. Therefore, a fault-tolerant server, built with diverse off-the-shelf servers, seems to have a good chance of delivering improvements in availability and failure rates compared with the individual off-the-shelf servers or their replicated, nondiverse configurations
On Systematic Design of Protectors for Employing OTS Items
Off-the-shelf (OTS) components are increasingly used in application areas with stringent dependability requirements. Component wrapping is a well known structuring technique used in many areas. We propose a general approach to developing protective wrappers that assist in integrating OTS items with a focus on the overall system dependability. The wrappers are viewed as redundant software used to detect errors or suspicious activity and to execute appropriate recovery when possible; wrapper development is considered as a part of system integration activities. Wrappers are to be rigorously specified and executed at run time as a means of protecting OTS items against faults in the rest of the system, and the system against the OTS item's faults. Possible symptoms of erroneous behaviour to be detected by a protective wrapper and possible actions to be undertaken in response are listed and discussed. The information required for wrapper development is provided by traceability analysis. Possible approaches to implementing “protectors” in the standard current component technologies are briefly outline
- …