210 research outputs found

    A Markov Chain state transition approach to establishing critical phases for AUV reliability

    Get PDF
    The deployment of complex autonomous underwater platforms for marine science comprises a series of sequential steps. Each step is critical to the success of the mission. In this paper we present a state transition approach, in the form of a Markov chain, which models the sequence of steps from pre-launch to operation to recovery. The aim is to identify the states and state transitions that present higher risk to the vehicle and hence to the mission, based on evidence and judgment. Developing a Markov chain consists of two separate tasks. The first defines the structure that encodes the sequence of events. The second task assigns probabilities to each possible transition. Our model comprises eleven discrete states, and includes distance-dependent underway survival statistics. The integration of the Markov model with underway survival statistics allows us to quantify the likelihood of success during each state and transition and consequently the likelihood of achieving the desired mission goals. To illustrate this generic process, the fault history of the Autosub3 autonomous underwater vehicle provides the information for different phases of operation. The method proposed here adds more detail to previous analyses; faults are discriminated according to the phase of the mission in which they took place

    Addressing Complexity and Intelligence in Systems Dependability Evaluation

    Get PDF
    Engineering and computing systems are increasingly complex, intelligent, and open adaptive. When it comes to the dependability evaluation of such systems, there are certain challenges posed by the characteristics of “complexity” and “intelligence”. The first aspect of complexity is the dependability modelling of large systems with many interconnected components and dynamic behaviours such as Priority, Sequencing and Repairs. To address this, the thesis proposes a novel hierarchical solution to dynamic fault tree analysis using Semi-Markov Processes. A second aspect of complexity is the environmental conditions that may impact dependability and their modelling. For instance, weather and logistics can influence maintenance actions and hence dependability of an offshore wind farm. The thesis proposes a semi-Markov-based maintenance model called “Butterfly Maintenance Model (BMM)” to model this complexity and accommodate it in dependability evaluation. A third aspect of complexity is the open nature of system of systems like swarms of drones which makes complete design-time dependability analysis infeasible. To address this aspect, the thesis proposes a dynamic dependability evaluation method using Fault Trees and Markov-Models at runtime.The challenge of “intelligence” arises because Machine Learning (ML) components do not exhibit programmed behaviour; their behaviour is learned from data. However, in traditional dependability analysis, systems are assumed to be programmed or designed. When a system has learned from data, then a distributional shift of operational data from training data may cause ML to behave incorrectly, e.g., misclassify objects. To address this, a new approach called SafeML is developed that uses statistical distance measures for monitoring the performance of ML against such distributional shifts. The thesis develops the proposed models, and evaluates them on case studies, highlighting improvements to the state-of-the-art, limitations and future work

    System-Level Reliability Modeling and Evaluation in Power Electronic-based Power Systems

    Get PDF

    Markov and Semi-markov Chains, Processes, Systems and Emerging Related Fields

    Get PDF
    This book covers a broad range of research results in the field of Markov and Semi-Markov chains, processes, systems and related emerging fields. The authors of the included research papers are well-known researchers in their field. The book presents the state-of-the-art and ideas for further research for theorists in the fields. Nonetheless, it also provides straightforwardly applicable results for diverse areas of practitioners

    Compositional dependability analysis of dynamic systems with uncertainty

    Get PDF
    Over the past two decades, research has focused on simplifying dependability analysis by looking at how we can synthesise dependability information from system models automatically. This has led to the field of model-based safety assessment (MBSA), which has attracted a significant amount of interest from industry, academia, and government agencies. Different model-based safety analysis methods, such as Hierarchically Performed Hazard Origin & Propagation Studies (HiP-HOPS), are increasingly applied by industry for dependability analysis of safety-critical systems. Such systems may feature multiple modes of operation where the behaviour of the systems and the interactions between system components can change according to what modes of operation the systems are in.MBSA techniques usually combine different classical safety analysis approaches to allow the analysts to perform safety analyses automatically or semi-automatically. For example, HiP-HOPS is a state-of-the-art MBSA approach which enhances an architectural model of a system with logical failure annotations to allow safety studies such as Fault Tree Analysis (FTA) and Failure Modes and Effects Analysis (FMEA). In this way it shows how the failure of a single component or combinations of failures of different components can lead to system failure. As systems are getting more complex and their behaviour becomes more dynamic, capturing this dynamic behaviour and the many possible interactions between the components is necessary to develop an accurate failure model.One of the ways of modelling this dynamic behaviour is with a state-transition diagram. Introducing a dynamic model compatible with the existing architectural information of systems can provide significant benefits in terms of accurate representation and expressiveness when analysing the dynamic behaviour of modern large-scale and complex safety-critical systems. Thus the first key contribution of this thesis is a methodology to enable MBSA techniques to model dynamic behaviour of systems. This thesis demonstrates the use of this methodology using the HiP-HOPS tool as an example, and thus extends HiP-HOPS with state-transition annotations. This extension allows HiP-HOPS to model more complex dynamic scenarios and perform compositional dynamic dependability analysis of complex systems by generating Pandora temporal fault trees (TFTs). As TFTs capture state, the techniques used for solving classical FTs are not suitable to solve them. They require a state space solution for quantification of probability. This thesis therefore proposes two methodologies based on Petri Nets and Bayesian Networks to provide state space solutions to Pandora TFTs.Uncertainty is another important (yet incomplete) area of MBSA: typical MBSA approaches are not capable of performing quantitative analysis under uncertainty. Therefore, in addition to the above contributions, this thesis proposes a fuzzy set theory based methodology to quantify Pandora temporal fault trees with uncertainty in failure data of components.The proposed methodologies are applied to a case study to demonstrate how they can be used in practice. Finally, the overall contributions of the thesis are evaluated by discussing the results produced and from these conclusions about the potential benefits of the new techniques are drawn

    What broke where for distributed and parallel applications — a whodunit story

    Get PDF
    Detection, diagnosis and mitigation of performance problems in today\u27s large-scale distributed and parallel systems is a difficult task. These large distributed and parallel systems are composed of various complex software and hardware components. When the system experiences some performance or correctness problem, developers struggle to understand the root cause of the problem and fix in a timely manner. In my thesis, I address these three components of the performance problems in computer systems. First, we focus on diagnosing performance problems in large-scale parallel applications running on supercomputers. We developed techniques to localize the performance problem for root-cause analysis. Parallel applications, most of which are complex scientific simulations running in supercomputers, can create up to millions of parallel tasks that run on different machines and communicate using the message passing paradigm. We developed a highly scalable and accurate automated debugging tool called PRODOMETER, which uses sophisticated algorithms to first, create a logical progress dependency graph of the tasks to highlight how the problem spread through the system manifesting as a system-wide performance issue. Second, uses this logical progress dependence graph to identify the task where the problem originated. Finally, PRODOMETER pinpoints the code region corresponding to the origin of the bug. Second, we developed a tool-chain that can detect performance anomaly using machine-learning techniques and can achieve very low false positive rate. Our input-aware performance anomaly detection system consists of a scalable data collection framework to collect performance related metrics from different granularity of code regions, an offline model creation and prediction-error characterization technique, and a threshold based anomaly-detection-engine for production runs. Our system requires few training runs and can handle unknown inputs and parameter combinations by dynamically calibrating the anomaly detection threshold according to the characteristics of the input data and the characteristics of the prediction-error of the models. Third, we developed performance problem mitigation scheme for erasure-coded distributed storage systems. Repair operations of the failed blocks in erasure-coded distributed storage system take really long time in networked constrained data-centers. The reason being, during the repair operation for erasure-coded distributed storage, a lot of data from multiple nodes are gathered into a single node and then a mathematical operation is performed to reconstruct the missing part. This process severely congests the links toward the destination where newly recreated data is to be hosted. We proposed a novel distributed repair technique, called Partial-Parallel-Repair (PPR) that performs this reconstruction in parallel on multiple nodes and eliminates network bottlenecks, and as a result, greatly speeds up the repair process. Fourth, we study how for a class of applications, performance can be improved (or performance problems can be mitigated) by selectively approximating some of the computations. For many applications, the main computation happens inside a loop that can be logically divided into a few temporal segments, we call phases. We found that while approximating the initial phases might severely degrade the quality of the results, approximating the computation for the later phases have very small impact on the final quality of the result. Based on this observation, we developed an optimization framework that for a given budget of quality-loss, would find the best approximation settings for each phase in the execution

    Methodologies synthesis

    Get PDF
    This deliverable deals with the modelling and analysis of interdependencies between critical infrastructures, focussing attention on two interdependent infrastructures studied in the context of CRUTIAL: the electric power infrastructure and the information infrastructures supporting management, control and maintenance functionality. The main objectives are: 1) investigate the main challenges to be addressed for the analysis and modelling of interdependencies, 2) review the modelling methodologies and tools that can be used to address these challenges and support the evaluation of the impact of interdependencies on the dependability and resilience of the service delivered to the users, and 3) present the preliminary directions investigated so far by the CRUTIAL consortium for describing and modelling interdependencies

    Maintenance Strategies Design and Assessment Using a Periodic Complexity Approach

    Get PDF
    People become more dependent on various devices, which do deteriorate over time and their operation becomes more complex. This leads to higher unexpected failure chance, which causes inconvenience, cost, time, and even lives. Therefore, an efficient maintenance strategy that reduces complexity should be established to ensure the system performs economically as designed without interruption. In the current research, a comprehensive novel approach is developed for designing and evaluating maintenance strategies that effectively reduce complexity in a cost efficient way with maximum availability and quality. A proper maintenance strategy application needs a rigorous failure definition. A new complexity based mathematical definition of failure is introduced that is able to model all failure types. A complexity-based metric, complication rate , is introduced to measure functionality degradation and gradual failure. Maintenance reduces the system complexity by system resetting via introducing periodicity. A metric for measuring the amount of periodicity introduced by maintenance strategy is developed. Developing efficient maintenance strategies that improve system performance criteria, requires developing the mathematical relationships between maintenance and quality, availability, and cost. The first relation relating the product quality to maintenance policy is developed using the virtual age concept. The aging intensity function is then deployed to develop the relation between maintenance and availability. The relation between maintenance and cost is formulated by investigating the maintenance effect on each cost element. The final step in maintenance policy design is finding the optimum periodicity level. Two approaches are investigated; weighted sum integrated with AHP and a comfort zones approach. Comfort zones is a new developed physical programming based optimization heuristic that captures designer preferences and limitations without substantial efforts in tweaking or calculating weights. A mining truck case study is presented to explain the application of the developed maintenance design approach and compare its results to the traditional reward renewal theory. It is shown that the developed approach is more capable of designing a maintenance policy that reduces complexity and simultaneously improves some other performance measures. This research explains that considering complexity reduction in maintenance policy design improves system functionality, and it can be achieved by simple industrially applicable approach
    • …
    corecore