12 research outputs found
Recommended from our members
Modeling the probability of failure on demand (pfd) of a 1-out-of-2 system in which one channel is “quasi-perfect”
Our earlier work proposed ways of overcoming some of the difficulties of lack of independence in reliability modeling of 1-out-of-2 software-based systems. Firstly, it is well known that aleatory independence between the failures of two channels A and B cannot be assumed, so system pfd is not a simple product of channel pfds. However, it has been shown that the probability of system failure can be bounded conservatively by a simple product of pfdA and pnpB (probability not perfect) in those special cases where channel B is sufficiently simple to be possibly perfect. Whilst this “solves” the problem of aleatory dependence, the issue of epistemic dependence remains: An assessor’s beliefs about unknown pfdA and pnpB will not have them independent. Recent work has partially overcome this problem by requiring only marginal beliefs – at the price of further conservatism. Here we generalize these results. Instead of “perfection” we introduce the notion of “quasi-perfection”: a small pfd practically equivalent to perfection (e.g. yielding very small chance of failure in the entire life of a fleet of systems). We present a conservative argument supporting claims about system pfd. We propose further work, e.g. to conduct “what if?” calculations to understand exactly how conservative our approach might be in practice, and suggest further simplifications
Recommended from our members
Conservative Claims for the Probability of Perfection of a Software-based System Using Operational Experience of Previous Similar Systems
We begin by briefly discussing the reasons why claims of probability of non-perfection ( pnp ) may sometimes be useful in reasoning about the reliability of software-based systems for safety-critical applications. We identify two ways in which this approach may make the system assessment problem easier. The first concerns the need t o assess the chance of lifetime freedom from failure of a single system . The second concerns the need to assess the reliability of multi-channel software-diverse fault tolerant systems – in this paper, 1-out-of-2 systems. In earlier work (Littlewood and Rushby 2012, Littlewood and Povyakalo 2013) it was proposed that, in certain applications, claims for possible perfection of one of the channels in such a system may be feasible. It was shown that in such a case there is a particularly simple conservative expression for system pfd (probability of failure on demand) , involving the pfd of one channel , and the pnp of the other. In this paper we address the problem of how to assess such a pnp . In previous work (Zhao 2015) we have addressed this problem when the evidence available is only extensive failure - free working of the system in question. Here we consider the case in which there is, in addition , evidence of the previous success of the software development procedures used to build the system: specifically, several previous similar systems built using the same process have exhibited failure -free working during extensive operational exposure
Recommended from our members
On the probability of perfection of Software-Based systems
The probability of perfection becomes of interest as the realization of its role in the reliability assessment of software-based systems. It is not only important on its own, but also in the reliability assessment of 1-out-of-2 diverse systems. By “perfection”, it means that thesoftware will never fail in a specific operating environment. If we assume that failures of a software system can occur if and only if it contains faults, then it means that the system is “fault-free”. Such perfection is possible for sufficiently simple software. While the perfection can never be certain, so the interest lies in claims for the probability of perfection.
In this thesis, firstly two different probabilities of perfection – an objective parameter characterizing a population property and a subjective confidence in the perfection of the specific software of interest – are distinguished and discussed. Then a conservative Bayesian method is used to claim about probability of perfection from various types of evidence, i.e. failure-free testing evidence, process evidence and formal proof evidence. Also, a “quasiperfection” notion is realized as a potentially useful approach to cover some shortages of perfection models. A possible framework to incorporate the various models is discussed at the end. There are generally two themes in this thesis: tackling the failure dependence issue in the reliability assessment of 1-out-of-2 diverse systems at both aleatory and epistemic levels; and degrading the well-known difficulty of specifying complete Bayesian priors into reasoning with only partial priors. Both of them are solved at the price of conservatism.
In summary, this thesis provides 3 parallel sets of (quasi-)perfection models which could be used individually as a conservative end-to-end argument that reasoning from various types of evidence to the reliability of a software-based system. Although in some cases models here are providing very conservative results, some ways are proposed of dealing with the excessive conservatism. In other cases, the very conservative results could serve as warnings/support to safety engineers/regulators in the face of claims based on reasoning that is less rigorous than the reasoning in this thesis
Conservative Confidence Bounds in Safety, from Generalised Claims of Improvement & Statistical Evidence
“Proven-in-use”, “globally-at-least-equivalent”, “stress-tested”, are concepts that come up in diverse contexts in acceptance, certification or licensing of critical systems. Their common feature is that dependability claims for a system in a certain operational environment are supported, in part, by evidence – viz of successful operation – concerning different, though related, system[s] and/or environment[s], together with an auxiliary argument that the target system/environment offers the same, or improved, safety. We propose a formal probabilistic (Bayesian) organisation for these arguments. Through specific examples of evidence for the “improvement” argument above, we demonstrate scenarios in which formalising such arguments substantially increases confidence in the target system, and show why this is not always the case. Example scenarios concern vehicles and nuclear plants. Besides supporting stronger claims, the mathematical formalisation imposes precise statements of the bases for “improvement” claims: seemingly similar forms of prior beliefs are sometimes revealed to imply substantial differences in the claims they can support
What, Indeed, is an Achievable Provable Guarantee for Learning-Enabled Safety Critical Systems
Machine learning has made remarkable advancements, but confidently utilising
learning-enabled components in safety-critical domains still poses challenges.
Among the challenges, it is known that a rigorous, yet practical, way of
achieving safety guarantees is one of the most prominent. In this paper, we
first discuss the engineering and research challenges associated with the
design and verification of such systems. Then, based on the observation that
existing works cannot actually achieve provable guarantees, we promote a
two-step verification method for the ultimate achievement of provable
statistical guarantees
Demonstrating software reliability using possibly correlated tests: Insights from a conservative Bayesian approach
AbstractThis paper presents Bayesian techniques for conservative claims about software reliability, particularly when evidence suggests the software's executions are not statistically independent. We formalise informal notions of “doubting” that the executions are independent, and incorporate such doubts into reliability assessments. We develop techniques that reveal the extent to which independence assumptions can undermine conservatism in assessments, and identify conditions under which this impact is not significant. These techniques – novel extensions of conservative Bayesian inference (CBI) approaches – give conservative confidence bounds on the software's failure probability per execution. With illustrations in two application areas – nuclear power‐plant safety and autonomous vehicle (AV) safety – our analyses reveals: (1) the confidence an assessor should possess before subjecting a system to operational testing. Otherwise, such testing is futile – favourable operational testing evidence will eventually decrease one's confidence in the system being sufficiently reliable; (2) the independence assumption supports conservative claims sometimes; (3) in some scenarios, observing a system operate without failure gives less confidence in the system than if some failures had been observed; (4) building confidence in a system is very sensitive to failures – each additional failure means significantly more operational testing is required, in order to support a reliability claim.</jats:p
Assessing the Safety and Reliability of Autonomous Vehicles from Road Testing
There is an urgent societal need to assess whether
autonomous vehicles (AVs) are safe enough. From published
quantitative safety and reliability assessments of AVs, we know
that, given the goal of predicting very low rates of accidents,
road testing alone requires infeasible numbers of miles to
be driven. However, previous analyses do not consider any
knowledge prior to road testing – knowledge which could bring
substantial advantages if the AV design allows strong expectations
of safety before road testing. We present the advantages of a new
variant of Conservative Bayesian Inference (CBI), which uses
prior knowledge while avoiding optimistic biases. We then study
the trend of disengagements (take-overs by human drivers) by
applying Software Reliability Growth Models (SRGMs) to data
from Waymo’s public road testing over 51 months, in view of the
practice of software updates during this testing. Our approach is
to not trust any specific SRGM, but to assess forecast accuracy
and then improve forecasts. We show that, coupled with accuracy
assessment and recalibration techniques, SRGMs could be a
valuable test planning aid
Recommended from our members
On Reliability Assessment When a Software-based System Is Replaced by a Thought-to-be-Better One
The failure history of pre-existing systems can inform a reliability assessment of a new system. Such assessments – consisting of arguments based on evidence from older systems – are attractive and have been used for quite some time for, typically, mechanical/hardware-only systems. But their application to software-based systems brings some challenges. In this paper, we present a conservative, Bayesian approach to software reliability assessment – one that combines reliability evidence from an old system with an assessor’s confidence in a newer system being an improved replacement for the old one. We demonstrate, via different scenarios, what a thought-to-be-better replacement formally means in practice, and what it allows one to believe about actual reliability improvement. The results can be used directly in a reliability assessment, or to caution system stakeholders and industry regulators against using other models that give optimistic assessments. For instance, even if one is certain that some new software must be more reliable than an old product, using the reliability distribution for the old software as a prior distribution when assessing the new system gives optimistic, not conservative, predictions for the posterior reliability of the new system after seeing operational testing evidence
Recommended from our members
Assessing Confidence with Assurance 2.0
An assurance case is intended to provide justifiable confidence in the truth of its top claim, which typically concerns safety or security. A natural question is then "how much" confidence does the case provide?
In this report, we explore issues in assessing confidence for assurance cases developed using the rigorous approach we call Assurance 2.0. We argue that confidence cannot be reduced to a single attribute or measurement. Instead, we suggest it should be based on attributes that draw on three different perspectives: positive, negative, and residual doubts.
Positive Perspectives consider the extent to which the evidence and overall argument of the case combine to make a positive statement justifying belief in its claims. We set a high bar for justification, requiring it to be indefeasible. The primary positive measure for this is soundness, which interprets the argument as a logical proof and delivers a yes/no measurement. The interior steps of an Assurance 2.0 case can be evaluated as logical axioms, but the evidential steps at the leaves derive logical claims epistemically---from observations or measurements about the system and its environment---and must be treated as premises. Confidence in these can be expressed probabilistically and we use confirmation measures to ensure that the probabilistic "weight" of evidence crosses some threshold.
In addition, probabilities can be aggregated from evidence through the steps of the argument using probability logics to yield what we call probabilistic valuations for the claims (in contrast to soundness, which is a logical valuation). The aggregated probability attached to the top claim can be interpreted as a numerical measure of confidence. We apply probabilistic valuations only to sound cases, and this avoids some of the difficulties that attend probabilistic methods that stand alone. The primary uses for probabilistic valuations are with less critical systems, where we trade assurance effort against confidence, and in assessing residual doubts.
Negative Perspectives record doubts and challenges to the case, typically expressed as defeaters, and their exploration and resolution. Assurance developers must guard against confirmation bias and should vigorously explore potential defeaters as they develop the case, and should record them and their resolution to avoid rework and to aid reviewers.
Residual Doubts: the world is uncertain so not all potential defeaters can be resolved. For example, we may design a system to tolerate two faults and have good reasons and evidence to suppose that is sufficient to cover the exposure on any expected mission. But doubts remain: what if more than two faults do arrive? Here we can explore consequences and likelihoods and thereby assess risk (their product). Some of these residual risks may be unacceptable and thereby prompt a review, but others may be considered acceptable or unavoidable. It is crucial however that these judgments are conscious ones and that they are recorded in the assurance case.
This report examines each of these three perspectives in detail and indicates how Clarissa, our prototype toolset for Assurance 2.0, assists in their evaluation
Assessing Safety-Critical Systems from Operational Testing: A Study on Autonomous Vehicles
Context: Demonstrating high reliability and safety for safety-critical systems (SCSs) remains a hard problem. Diverse evidence needs to be combined in a rigorous way: in particular, results of operational testing with other evidence from design and verification. Growing use of machine learning in SCSs, by precluding most established methods for gaining assurance, makes evidence from operational testing even more important for supporting safety and reliability claims.
Objective: We revisit the problem of using operational testing to demonstrate high reliability. We use Autonomous Vehicles (AVs) as a current example. AVs are making their debut on public roads: methods for assessing whether an AV is safe enough are urgently needed. We demonstrate how to answer 5 questions that would arise in assessing an AV type, starting with those proposed by a highly-cited study.
Method: We apply new theorems extending our Conservative Bayesian Inference (CBI) approach, which exploit the rigour of Bayesian methods while reducing the risk of involuntary misuse associated (we argue) with now-common applications of Bayesian inference; we define additional conditions needed for applying these methods to AVs.
Results: Prior knowledge can bring substantial advantages if the AV design allows strong expectations of safety before road testing. We also show how naive attempts at conservative assessment may lead to over-optimism instead; why extrapolating the trend of disengagements (take-overs by human drivers) is not suitable for safety claims; use of knowledge that an AV has moved to a “less stressful” environment.
Conclusion: While some reliability targets will remain too high to be practically verifiable, our CBI approach removes a major source of doubt: it allows use of prior knowledge without inducing dangerously optimistic biases. For certain ranges of required reliability and prior beliefs, CBI thus supports feasible, sound arguments. Useful conservative claims can be derived from limited prior knowledge