1,909 research outputs found

    Cross-layer system reliability assessment framework for hardware faults

    Get PDF
    System reliability estimation during early design phases facilitates informed decisions for the integration of effective protection mechanisms against different classes of hardware faults. When not all system abstraction layers (technology, circuit, microarchitecture, software) are factored in such an estimation model, the delivered reliability reports must be excessively pessimistic and thus lead to unacceptably expensive, over-designed systems. We propose a scalable, cross-layer methodology and supporting suite of tools for accurate but fast estimations of computing systems reliability. The backbone of the methodology is a component-based Bayesian model, which effectively calculates system reliability based on the masking probabilities of individual hardware and software components considering their complex interactions. Our detailed experimental evaluation for different technologies, microarchitectures, and benchmarks demonstrates that the proposed model delivers very accurate reliability estimations (FIT rates) compared to statistically significant but slow fault injection campaigns at the microarchitecture level.Peer ReviewedPostprint (author's final draft

    Expert Elicitation for Reliable System Design

    Full text link
    This paper reviews the role of expert judgement to support reliability assessments within the systems engineering design process. Generic design processes are described to give the context and a discussion is given about the nature of the reliability assessments required in the different systems engineering phases. It is argued that, as far as meeting reliability requirements is concerned, the whole design process is more akin to a statistical control process than to a straightforward statistical problem of assessing an unknown distribution. This leads to features of the expert judgement problem in the design context which are substantially different from those seen, for example, in risk assessment. In particular, the role of experts in problem structuring and in developing failure mitigation options is much more prominent, and there is a need to take into account the reliability potential for future mitigation measures downstream in the system life cycle. An overview is given of the stakeholders typically involved in large scale systems engineering design projects, and this is used to argue the need for methods that expose potential judgemental biases in order to generate analyses that can be said to provide rational consensus about uncertainties. Finally, a number of key points are developed with the aim of moving toward a framework that provides a holistic method for tracking reliability assessment through the design process.Comment: This paper commented in: [arXiv:0708.0285], [arXiv:0708.0287], [arXiv:0708.0288]. Rejoinder in [arXiv:0708.0293]. Published at http://dx.doi.org/10.1214/088342306000000510 in the Statistical Science (http://www.imstat.org/sts/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Addressing Complexity and Intelligence in Systems Dependability Evaluation

    Get PDF
    Engineering and computing systems are increasingly complex, intelligent, and open adaptive. When it comes to the dependability evaluation of such systems, there are certain challenges posed by the characteristics of “complexity” and “intelligence”. The first aspect of complexity is the dependability modelling of large systems with many interconnected components and dynamic behaviours such as Priority, Sequencing and Repairs. To address this, the thesis proposes a novel hierarchical solution to dynamic fault tree analysis using Semi-Markov Processes. A second aspect of complexity is the environmental conditions that may impact dependability and their modelling. For instance, weather and logistics can influence maintenance actions and hence dependability of an offshore wind farm. The thesis proposes a semi-Markov-based maintenance model called “Butterfly Maintenance Model (BMM)” to model this complexity and accommodate it in dependability evaluation. A third aspect of complexity is the open nature of system of systems like swarms of drones which makes complete design-time dependability analysis infeasible. To address this aspect, the thesis proposes a dynamic dependability evaluation method using Fault Trees and Markov-Models at runtime.The challenge of “intelligence” arises because Machine Learning (ML) components do not exhibit programmed behaviour; their behaviour is learned from data. However, in traditional dependability analysis, systems are assumed to be programmed or designed. When a system has learned from data, then a distributional shift of operational data from training data may cause ML to behave incorrectly, e.g., misclassify objects. To address this, a new approach called SafeML is developed that uses statistical distance measures for monitoring the performance of ML against such distributional shifts. The thesis develops the proposed models, and evaluates them on case studies, highlighting improvements to the state-of-the-art, limitations and future work

    A review of applications of fuzzy sets to safety and reliability engineering

    Get PDF
    Safety and reliability are rigorously assessed during the design of dependable systems. Probabilistic risk assessment (PRA) processes are comprehensive, structured and logical methods widely used for this purpose. PRA approaches include, but not limited to Fault Tree Analysis (FTA), Failure Mode and Effects Analysis (FMEA), and Event Tree Analysis (ETA). In conventional PRA, failure data about components is required for the purposes of quantitative analysis. In practice, it is not always possible to fully obtain this data due to unavailability of primary observations and consequent scarcity of statistical data about the failure of components. To handle such situations, fuzzy set theory has been successfully used in novel PRA approaches for safety and reliability evaluation under conditions of uncertainty. This paper presents a review of fuzzy set theory based methodologies applied to safety and reliability engineering, which include fuzzy FTA, fuzzy FMEA, fuzzy ETA, fuzzy Bayesian networks, fuzzy Markov chains, and fuzzy Petri nets. Firstly, we describe relevant fundamentals of fuzzy set theory and then we review applications of fuzzy set theory to system safety and reliability analysis. The review shows the context in which each technique may be more appropriate and highlights the overall potential usefulness of fuzzy set theory in addressing uncertainty in safety and reliability engineering

    Bayesian networks with imprecise datasets : application to oscillating water column

    Get PDF
    The Bayesian Network approach is a probabilistic method with an increasing use in the risk assessment of complex systems. It has proven to be a reliable and powerful tool with the flexibility to include different types of data (from experimental data to expert judgement). The incorporation of system reliability methods allows traditional Bayesian networks to work with random variables with discrete and continuous distributions. On the other hand, probabilistic uncertainty comes from the complexity of reality that scientists try to reproduce by setting a controlled experiment, while imprecision is related to the quality of the specific instrument making the measurements. This imprecision or lack of data can be taken into account by the use of intervals and probability boxes as random variables in the network. The resolution of the system reliability problems to deal with these kinds of uncertainties has been carried out adopting Monte Carlo simulations. However, the latter method is computationally expensive preventing from producing a real-time analysis of the system represented by the network. In this work, the line sampling algorithm is used as an effective method to improve the efficiency of the reduction process from enhanced to traditional Bayesian networks. This allows to preserve all the advantages without increasing excessively the computational cost of the analysis. As an application example, a risk assessment of an oscillating water column is carried out using data obtained in the laboratory. The proposed method is run using the multipurpose software OpenCossan

    Improved dynamic dependability assessment through integration with prognostics

    Get PDF
    The use of average data for dependability assessments results in a outdated system-level dependability estimation which can lead to incorrect design decisions. With increasing availability of online data, there is room to improve traditional dependability assessment techniques. Namely, prognostics is an emerging field which provides asset-specific failure information which can be reused to improve the system level failure estimation. This paper presents a framework for prognostics-updated dynamic dependability assessment. The dynamic behaviour comes from runtime updated information, asset inter-dependencies, and time-dependent system behaviour. A case study from the power generation industry is analysed and results confirm the validity of the approach for improved near real-time unavailability estimations

    A fuzzy Bayesian network approach for risk analysis in process industries

    Get PDF
    YesFault tree analysis is a widely used method of risk assessment in process industries. However, the classical fault tree approach has its own limitations such as the inability to deal with uncertain failure data and to consider statistical dependence among the failure events. In this paper, we propose a comprehensive framework for the risk assessment in process industries under the conditions of uncertainty and statistical dependency of events. The proposed approach makes the use of expert knowledge and fuzzy set theory for handling the uncertainty in the failure data and employs the Bayesian network modeling for capturing dependency among the events and for a robust probabilistic reasoning in the conditions of uncertainty. The effectiveness of the approach was demonstrated by performing risk assessment in an ethylene transportation line unit in an ethylene oxide (EO) production plant

    Automatic Resource Allocation for High Availability Cloud Services

    Get PDF
    AbstractThis paper proposes an approach to support cloud brokers finding optimal configurations in the deployment of dependability and security sensitive cloud applications. The approach is based on model-driven principles and uses both UML and Bayesian Networks to capture, analyse and optimise cloud deployment configurations. While the paper is most focused on the initial allocation phase, the approach is extensible to the operational phases of the life-cycle. In such a way, a continuous improvement of cloud applications may be realised by monitoring, enforcing and re-negotiating cloud resources following detected anomalies and failures
    • …
    corecore