92 research outputs found

    Novel models and algorithms for systems reliability modeling and optimization

    Get PDF
    Recent growth in the scale and complexity of products and technologies in the defense and other industries is challenging product development, realization, and sustainment costs. Uncontrolled costs and routine budget overruns are causing all parties involved to seek lean product development processes and treatment of reliability, availability, and maintainability of the system as a true design parameter . To this effect, accurate estimation and management of the system reliability of a design during the earliest stages of new product development is not only critical for managing product development and manufacturing costs but also to control life cycle costs (LCC). In this regard, the overall objective of this research study is to develop an integrated framework for design for reliability (DFR) during upfront product development by treating reliability as a design parameter. The aim here is to develop the theory, methods, and tools necessary for: 1) accurate assessment of system reliability and availability and 2) optimization of the design to meet system reliability targets. In modeling the system reliability and availability, we aim to address the limitations of existing methods, in particular the Markov chains method and the Dynamic Bayesian Network approach, by incorporating a Continuous Time Bayesian Network framework for more effective modeling of sub-system/component interactions, dependencies, and various repair policies. We also propose a multi-object optimization scheme to aid the designer in obtaining optimal design(s) with respect to system reliability/availability targets and other system design requirements. In particular, the optimization scheme would entail optimal selection of sub-system and component alternatives. The theory, methods, and tools to be developed will be extensively tested and validated using simulation test-bed data and actual case studies from our industry partners

    Integrated Optimization of IT Service Performance and Availability Using Performability Prediction Models

    Get PDF
    Optimizing the performance and availability of an IT service in the design stage are typically considered as independent tasks. However, since both aspects are related to one another, these activities could be combined by applying performability models, in which both the performance and the availability of a service can be more accurately predicted. In this paper, a design optimization problem for IT services is defined and applied in two scenarios, one of which considers a mechanism in which redundant components can be used both for failover as well as handling overload situations. Results show that including such aspects affecting both availability and performance in prediction models can lead to more cost-effective service designs. Thus, performability prediction models are one opportunity to combine performance and availability management for IT services

    Addressing Complexity and Intelligence in Systems Dependability Evaluation

    Get PDF
    Engineering and computing systems are increasingly complex, intelligent, and open adaptive. When it comes to the dependability evaluation of such systems, there are certain challenges posed by the characteristics of “complexity” and “intelligence”. The first aspect of complexity is the dependability modelling of large systems with many interconnected components and dynamic behaviours such as Priority, Sequencing and Repairs. To address this, the thesis proposes a novel hierarchical solution to dynamic fault tree analysis using Semi-Markov Processes. A second aspect of complexity is the environmental conditions that may impact dependability and their modelling. For instance, weather and logistics can influence maintenance actions and hence dependability of an offshore wind farm. The thesis proposes a semi-Markov-based maintenance model called “Butterfly Maintenance Model (BMM)” to model this complexity and accommodate it in dependability evaluation. A third aspect of complexity is the open nature of system of systems like swarms of drones which makes complete design-time dependability analysis infeasible. To address this aspect, the thesis proposes a dynamic dependability evaluation method using Fault Trees and Markov-Models at runtime.The challenge of “intelligence” arises because Machine Learning (ML) components do not exhibit programmed behaviour; their behaviour is learned from data. However, in traditional dependability analysis, systems are assumed to be programmed or designed. When a system has learned from data, then a distributional shift of operational data from training data may cause ML to behave incorrectly, e.g., misclassify objects. To address this, a new approach called SafeML is developed that uses statistical distance measures for monitoring the performance of ML against such distributional shifts. The thesis develops the proposed models, and evaluates them on case studies, highlighting improvements to the state-of-the-art, limitations and future work

    Efficient Reliability Modelling & Analysis of Complex Systems with Application to Nuclear Power Plant Safety

    Get PDF
    Nuclear power may be our best chance at a permanent solution to the world's energy challenges, owing to its sustainability and environmental friendliness. However, it also poses a great risk to life, property, and the economy, given the possibility of severe accidents during its generation. These accidents are a result of the susceptibility of the generating plants to component failure, human error, extreme environmental events, targeted attacks, and natural disasters. Given the complexity and high interconnectivity of the systems in question, a small glitch, otherwise known as an initiating event, could cascade to catastrophic consequences. It is, therefore, vital that the vulnerability of a plant to these glitches and their ensuing consequences be ascertained, to ensure that the appropriate mitigating actions are taken. The reliability of a system is the likelihood that it survives a defined period and its availability is the likelihood of it being capable of performing its required functions on demand. These quantities are important to a nuclear power plant's safety because, a nuclear power plant by default is equipped with safety systems to inhibit the propagation of an initiating event. An accident ensues if the safety systems required to mitigate some initiating event are unavailable or incapacitated by the initiating event. It is, therefore, easy to see that the reliability, as well as the availability of these systems, shape the safety of the plant. These crucial quantities, currently, are estimated using legacy techniques like static fault and event tree analyses or their derivatives. Despite their popularity and widely acclaimed success, these legacy techniques lack the flexibility to implement fully the operational dynamics of the majority of systems. Most importantly, their ease of application deteriorates with increasing system size and complexity, such that the analyst is often forced to make unrealistic assumptions. These unrealistic assumptions sometimes compromise the accuracy of the results obtained and subsequently, the quality of the risk management decisions reached. Their inadequacy is often amplified if the system is composed of multi-state components or characterised by epistemic uncertainties, induced by vague or imprecise data. The ideal approach, therefore, should be sufficiently robust to not necessitate unrealistic assumptions but flexible enough to accommodate realistic system attributes, while guaranteeing accuracy. This dissertation provides a detailed account of a series of computationally efficient system reliability analysis techniques proposed to address the limitations of the existing probabilistic risk assessment approaches. The proposed techniques are based mainly, on an advanced hybrid event-driven Monte Carlo simulation technique that invokes load-flow principles to resolve, intuitively, the difficulties associated with the topological complexity of systems and the multi-state attributes of their components. In addition to their intuitiveness and relative completeness, a key advantage of the proposed techniques is their general applicability. They have been applied, for instance, to a variety of problems, ranging from the production availability of an offshore oil installation and the maintenance strategy optimization of the IEEE-24 bus test system to the probabilistic risk assessment of station blackout accidents at the Maanshan nuclear power plant in Taiwan. The proposed techniques, therefore, should influence robust decisions in the risk management of not only nuclear power plants but other critical systems as well. They have been incorporated into the open-source uncertainty quantification tool, OpenCossan, to render them readily available to industry and other researchers

    Automated system design optimisation

    Get PDF
    The focus of this thesis is to develop a generic approach for solving reliability design optimisation problems which could be applicable to a diverse range of real engineering systems. The basic problem in optimal reliability design of a system is to explore the means of improving the system reliability within the bounds of available resources. Improving the reliability reduces the likelihood of system failure. The consequences of system failure can vary from minor inconvenience and cost to significant economic loss and personal injury. However any improvements made to the system are subject to the availability of resources, which are very often limited. The objective of the design optimisation problem analysed in this thesis is to minimise system unavailability (or unreliability if an unrepairable system is analysed) through the manipulation and assessment of all possible design alterations available, which are subject to constraints on resources and/or system performance requirements. This thesis describes a genetic algorithm-based technique developed to solve the optimisation problem. Since an explicit mathematical form can not be formulated to evaluate the objective function, the system unavailability (unreliability) is assessed using the fault tree method. Central to the optimisation algorithm are newly developed fault tree modification patterns (FTMPs). They are employed here to construct one fault tree representing all possible designs investigated, from the initial system design specified along with the design choices. This is then altered to represent the individual designs in question during the optimisation process. Failure probabilities for specified design cases are quantified by employing Binary Decision Diagrams (BDDs). A computer programme has been developed to automate the application of the optimisation approach to standard engineering safety systems. Its practicality is demonstrated through the consideration of two systems of increasing complexity; first a High Integrity Protection System (HIPS) followed by a Fire Water Deluge System (FWDS). The technique is then further-developed and applied to solve problems of multi-phased mission systems. Two systems are considered; first an unmanned aerial vehicle (UAV) and secondly a military vessel. The final part of this thesis focuses on continuing the development process by adapting the method to solve design optimisation problems for multiple multi-phased mission systems. Its application is demonstrated by considering an advanced UAV system involving multiple multi-phased flight missions. The applications discussed prove that the technique progressively developed in this thesis enables design optimisation problems to be solved for systems with different levels of complexity. A key contribution of this thesis is the development of a novel generic optimisation technique, embedding newly developed FTMPs, which is capable of optimising the reliability design for potentially any engineering system. Another key and novel contribution of this work is the capability to analyse and provide optimal design solutions for multiple multi-phase mission systems. Keywords: optimisation, system design, multi-phased mission system, reliability, genetic algorithm, fault tree, binary decision diagra

    Synthesizing FDIR Recovery Strategies for Space Systems

    Get PDF
    Dynamic Fault Trees (DFTs) are powerful tools to drive the design of fault tolerant systems. However, semantic pitfalls limit their practical utility for interconnected systems that require complex recovery strategies to maximize their reliability. This thesis discusses the shortcomings of DFTs in the context of analyzing Fault Detection, Isolation and Recovery (FDIR) concepts with a particular focus on the needs of space systems. To tackle these shortcomings, we introduce an inherently non-deterministic model for DFTs. Deterministic recovery strategies are synthesized by transforming these non-deterministic DFTs into Markov automata that represent all possible choices between recovery actions. From the corresponding scheduler, optimized to maximize a given RAMS (Reliability, Availability, Maintainability and Safety) metric, an optimal recovery strategy can then be derived and represented by a model we call recovery automaton. We discuss dedicated techniques for reducing the state space of this recovery automaton and analyze their soundness and completeness. Moreover, modularized approaches to handle the complexity added by the state-based transformation approach are discussed. Furthermore, we consider the non-deterministic approach in a partially observable setting and propose an approach to lift the model for the fully observable case. We give an implementation of our approach within the Model-Based Systems Engineering (MBSE) framework Virtual Satellite. Finally, the implementation is evaluated based on the FFORT benchmark. The results show that basic non-deterministic DFTs generally scale well. However, we also found that semantically enriched non-deterministic DFTs employing repair or delayed observability mechanisms pose a challenge

    Lifetime reliability of multi-core systems: modeling and applications.

    Get PDF
    Huang, Lin.Thesis (M.Phil.)--Chinese University of Hong Kong, 2011.Includes bibliographical references (leaves 218-232).Abstracts in English and Chinese.Abstract --- p.iAcknowledgement --- p.ivChapter 1 --- Introduction --- p.1Chapter 1.1 --- Preface --- p.1Chapter 1.2 --- Background --- p.5Chapter 1.3 --- Contributions --- p.6Chapter 1.3.1 --- Lifetime Reliability Modeling --- p.6Chapter 1.3.2 --- Simulation Framework --- p.7Chapter 1.3.3 --- Applications --- p.9Chapter 1.4 --- Thesis Outline --- p.10Chapter I --- Modeling --- p.12Chapter 2 --- Lifetime Reliability Modeling --- p.13Chapter 2.1 --- Notation --- p.13Chapter 2.2 --- Assumption --- p.16Chapter 2.3 --- Introduction --- p.16Chapter 2.4 --- Related Work --- p.19Chapter 2.5 --- System Model --- p.21Chapter 2.5.1 --- Reliability of A Surviving Component --- p.22Chapter 2.5.2 --- Reliability of a Hybrid k-out-of-n:G System --- p.26Chapter 2.6 --- Special Cases --- p.31Chapter 2.6.1 --- Case I: Gracefully Degrading System --- p.31Chapter 2.6.2 --- Case II: Standby Redundant System --- p.33Chapter 2.6.3 --- Case III: l-out-of-3:G System with --- p.34Chapter 2.7 --- Numerical Results --- p.37Chapter 2.7.1 --- Experimental Setup --- p.37Chapter 2.7.2 --- Experimental Results and Discussion --- p.40Chapter 2.8 --- Conclusion --- p.43Chapter 2.9 --- Appendix --- p.44Chapter II --- Simulation Framework --- p.47Chapter 3 --- AgeSim: A Simulation Framework --- p.48Chapter 3.1 --- Introduction --- p.48Chapter 3.2 --- Preliminaries and Motivation --- p.51Chapter 3.2.1 --- Prior Work on Lifetime Reliability Analysis of Processor- Based Systems --- p.51Chapter 3.2.2 --- Motivation of This Work --- p.53Chapter 3.3 --- The Proposed Framework --- p.54Chapter 3.4 --- Aging Rate Calculation --- p.57Chapter 3.4.1 --- Lifetime Reliability Calculation --- p.58Chapter 3.4.2 --- Aging Rate Extraction --- p.60Chapter 3.4.3 --- Discussion on Representative Workload --- p.63Chapter 3.4.4 --- Numerical Validation --- p.65Chapter 3.4.5 --- Miscellaneous --- p.66Chapter 3.5 --- Lifetime Reliability Model for MPSoCs with Redundancy --- p.68Chapter 3.6 --- Case Studies --- p.70Chapter 3.6.1 --- Dynamic Voltage and Frequency Scaling --- p.71Chapter 3.6.2 --- Burst Task Arrival --- p.75Chapter 3.6.3 --- Task Allocation on Multi-Core Processors --- p.77Chapter 3.6.4 --- Timeout Policy on Multi-Core Processors with Gracefully Degrading Redundancy --- p.78Chapter 3.7 --- Conclusion --- p.79Chapter 4 --- Evaluating Redundancy Schemes --- p.83Chapter 4.1 --- Introduction --- p.83Chapter 4.2 --- Preliminaries and Motivation --- p.85Chapter 4.2.1 --- Failure Mechanisms --- p.85Chapter 4.2.2 --- Related Work and Motivation --- p.86Chapter 4.3 --- Proposed Analytical Model for the Lifetime Reliability of Proces- sor Cores --- p.88Chapter 4.3.1 --- "Impact of Temperature, Voltage, and Frequency" --- p.88Chapter 4.3.2 --- Impact of Workloads --- p.92Chapter 4.4 --- Lifetime Reliability Analysis for Multi-core Processors with Vari- ous Redundancy Schemes --- p.95Chapter 4.4.1 --- Gracefully Degrading System (GDS) --- p.95Chapter 4.4.2 --- Processor Rotation System (PRS) --- p.97Chapter 4.4.3 --- Standby Redundant System (SRS) --- p.98Chapter 4.4.4 --- Extension to Heterogeneous System --- p.99Chapter 4.5 --- Experimental Methodology --- p.101Chapter 4.5.1 --- Workload Description --- p.102Chapter 4.5.2 --- Temperature Distribution Extraction --- p.102Chapter 4.5.3 --- Reliability Factors --- p.103Chapter 4.6 --- Results and Discussions --- p.103Chapter 4.6.1 --- Wear-out Rate Computation --- p.103Chapter 4.6.2 --- Comparison on Lifetime Reliability --- p.105Chapter 4.6.3 --- Comparison on Performance --- p.110Chapter 4.6.4 --- Comparison on Expected Computation Amount --- p.112Chapter 4.7 --- Conclusion --- p.118Chapter III --- Applications --- p.119Chapter 5 --- Task Allocation and Scheduling for MPSoCs --- p.120Chapter 5.1 --- Introduction --- p.120Chapter 5.2 --- Prior Work and Motivation --- p.122Chapter 5.2.1 --- IC Lifetime Reliability --- p.122Chapter 5.2.2 --- Task Allocation and Scheduling for MPSoC Designs --- p.124Chapter 5.3 --- Proposed Task Allocation and Scheduling Strategy --- p.126Chapter 5.3.1 --- Problem Definition --- p.126Chapter 5.3.2 --- Solution Representation --- p.128Chapter 5.3.3 --- Cost Function --- p.129Chapter 5.3.4 --- Simulated Annealing Process --- p.130Chapter 5.4 --- Lifetime Reliability Computation for MPSoC Embedded Systems --- p.133Chapter 5.5 --- Efficient MPSoC Lifetime Approximation --- p.138Chapter 5.5.1 --- Speedup Technique I - Multiple Periods --- p.139Chapter 5.5.2 --- Speedup Technique II - Steady Temperature --- p.139Chapter 5.5.3 --- Speedup Technique III - Temperature Pre- calculation --- p.140Chapter 5.5.4 --- Speedup Technique IV - Time Slot Quantity Control --- p.144Chapter 5.6 --- Experimental Results --- p.144Chapter 5.6.1 --- Experimental Setup --- p.144Chapter 5.6.2 --- Results and Discussion --- p.146Chapter 5.7 --- Conclusion and Future Work --- p.152Chapter 6 --- Energy-Efficient Task Allocation and Scheduling --- p.154Chapter 6.1 --- Introduction --- p.154Chapter 6.2 --- Preliminaries and Problem Formulation --- p.157Chapter 6.2.1 --- Related Work --- p.157Chapter 6.2.2 --- Problem Formulation --- p.159Chapter 6.3 --- Analytical Models --- p.160Chapter 6.3.1 --- Performance and Energy Models for DVS-Enabled Pro- cessors --- p.160Chapter 6.3.2 --- Lifetime Reliability Model --- p.163Chapter 6.4 --- Proposed Algorithm for Single-Mode Embedded Systems --- p.165Chapter 6.4.1 --- Task Allocation and Scheduling --- p.165Chapter 6.4.2 --- Voltage Assignment for DVS-Enabled Processors --- p.168Chapter 6.5 --- Proposed Algorithm for Multi-Mode Embedded Systems --- p.169Chapter 6.5.1 --- Feasible Solution Set --- p.169Chapter 6.5.2 --- Searching Procedure for a Single Mode --- p.171Chapter 6.5.3 --- Feasible Solution Set Identification --- p.171Chapter 6.5.4 --- Multi-Mode Combination --- p.177Chapter 6.6 --- Experimental Results --- p.178Chapter 6.6.1 --- Experimental Setup --- p.178Chapter 6.6.2 --- Case Study --- p.180Chapter 6.6.3 --- Sensitivity Analysis --- p.181Chapter 6.6.4 --- Extensive Results --- p.183Chapter 6.7 --- Conclusion --- p.185Chapter 7 --- Customer-Aware Task Allocation and Scheduling --- p.186Chapter 7.1 --- Introduction --- p.186Chapter 7.2 --- Prior Work and Problem Formulation --- p.188Chapter 7.2.1 --- Related Work and Motivation --- p.188Chapter 7.2.2 --- Problem Formulation --- p.191Chapter 7.3 --- Proposed Design-Stage Task Allocation and Scheduling --- p.192Chapter 7.3.1 --- Solution Representation and Moves --- p.193Chapter 7.3.2 --- Cost Function --- p.196Chapter 7.3.3 --- Impact of DVFS --- p.198Chapter 7.4 --- Proposed Algorithm for Online Adjustment --- p.200Chapter 7.4.1 --- Reliability Requirement for Online Adjustment --- p.201Chapter 7.4.2 --- Analytical Model --- p.203Chapter 7.4.3 --- Overall Flow --- p.204Chapter 7.5 --- Experimental Results --- p.205Chapter 7.5.1 --- Experimental Setup --- p.205Chapter 7.5.2 --- Results and Discussion --- p.207Chapter 7.6 --- Conclusion --- p.211Chapter 7.7 --- Appendix --- p.211Chapter 8 --- Conclusion and Future Work --- p.214Chapter 8.1 --- Conclusion --- p.214Chapter 8.2 --- Future Work --- p.215Bibliography --- p.23

    Safety and Reliability - Safe Societies in a Changing World

    Get PDF
    The contributions cover a wide range of methodologies and application areas for safety and reliability that contribute to safe societies in a changing world. These methodologies and applications include: - foundations of risk and reliability assessment and management - mathematical methods in reliability and safety - risk assessment - risk management - system reliability - uncertainty analysis - digitalization and big data - prognostics and system health management - occupational safety - accident and incident modeling - maintenance modeling and applications - simulation for safety and reliability analysis - dynamic risk and barrier management - organizational factors and safety culture - human factors and human reliability - resilience engineering - structural reliability - natural hazards - security - economic analysis in risk managemen

    A GUIDED SIMULATION METHODOLOGY FOR DYNAMIC PROBABILISTIC RISK ASSESSMENT OF COMPLEX SYSTEMS

    Get PDF
    Probabilistic risk assessment (PRA) is a systematic process of examining how engineered systems work to ensure safety. With the growth of the size of the dynamic systems and the complexity of the interactions between hardware, software, and humans, it is extremely difficult to enumerate the risky scenarios by the traditional PRA methods. Over the past 15 years, a host of DPRA methods have been proposed to serve as supplemental tools to traditional PRA to deal with complex dynamic systems. A new dynamic probabilistic risk assessment framework is proposed in this dissertation. In this framework a new exploration strategy is employed. The engineering knowledge of the system is explicitly used to guide the simulation to achieve higher efficiency and accuracy. The engineering knowledge is reflected in the "Planner" which is responsible for generating plans as a high level map to guide the simulation. A scheduler is responsible for guiding the simulation by controlling the timing and occurrence of the random events. During the simulation the possible random events are proposed to the scheduler at branch points. The scheduler decides which events are to be simulated. Scheduler would favor the events with higher values. The value of a proposed event depends on the information gain from exploring that scenario, and the importance factor of the scenario. The information gain is measured by the information entropy, and the importance factor is based on the engineering judgment. The simulation results are recorded and grouped for later studies. The planner may "learn" from the simulation results, and update the plan to guide further simulation. SIMPRA is the software package which implements the new methodology. It provides the users with a friendly interface and a rich DPRA library to aid in the construction of the simulation model. The engineering knowledge can be input into the Planner, which would generate a plan automatically. The scheduler would guide the simulation according to the plan. The simulation generates many accident event sequences and estimates of the end state probabilities
    • …
    corecore