301 research outputs found

    Reliability Evaluation of Common-Cause Failures and Other Interdependencies in Large Reconfigurable Networks

    Get PDF
    This work covers the impact of Interdependencies and CCFs in large repairable networks with possibility of "re-configuration" after a fault and the consequent disconnection of the faulted equipment. Typical networks with these characteristics are the Utilities, e.g. Power Transmission and Distribution Systems, Telecommunication Systems, Gas and Water Utilities, Wi Fi networks. The main issues of the research are: (a) Identification of the specific interdependencies and CCFs in large repairable networks, and (b)Evaluation of their impact on the reliability parameters (load nodes availability, etc.). The research has identified (1) the system and equipment failure modes that are relevant to interdependencies and CCF, and their subsequent effects, and (2) The hidden interdependencies and CCFs relevant to control, supervision and protection systems, and to the automatic change-over systems, that have no impact in normal operation, but that can cause relevant out-of-service when the above automatic systems are called to operate under and after fault conditions. Additionally methods were introduced to include interdependencies and CCFs in the reliability and availability models. The results of the research include a new generalized approach to model the repairable networks for reliability analysis, including Interdependencies/CCFs as a main contributor. The method covers Generalized models for Nodes, Branches and Load nodes; Interdependencies and CCFs on Networks / Components; System Interdependencies/CCFs; Functional Interdependencies/CCFs; Simultaneous and non-simultaneous Interdependencies/CCFs. As an example detailed Interdependency/CCFs analysis and generalized model of an important network structure (a "RING" with load nodes) has been analyzed in detail

    ANALYTICAL MODELING AND SIMULATION OF RELIABILITY OF A CLOSED HOMOGENEOUS SYSTEM WITH AN ARBITRARY NUMBER OF DATA SOURCES AND LIMITED RESOURCES FOR THEIR PROCESSING

    Get PDF
    Continuous development of computer networks and data transmission systems underlines the growing need for adequate mathematical models and methods for analyzing the performance and reliability metrics of these systems, taking into account the performance of their redundant components. We consider a mathematical model of a repairable data transmission system as a model of a closed homogeneous cold standby system with a single repair facility and with exponentially distributed lifetimes and generally distributed repair times of the system's elements. We study the system-level reliability, defined as the stationary probability of failure-free operation of the considered system. The proposed analytical methodology made it possible to evaluate the reliability of the entire system in case of failures of its elements. Explicit analytical expressions were obtained for the stationary probability of the system's failure-free operation and stationary system state probabilities, which allow analyzing other operational characteristics of the system with respect to the performance of its redundant elements. Explicit analytical expressions for the stationary state probabilities of the considered system cannot always be obtained; therefore, to obtain results in the case of general distribution of elements' repair time, a discrete-event simulation model was constructed to approximate the analytical model of the system. The simulation algorithm was programmatically implemented in R. The comparison of numerical and graphical results obtained using both analytical and simulation approaches showed that they were in close agreement, so the proposed simulation model can be used in cases where the analytical solution cannot be obtained explicitly or as part of a more complex simulation model. We’ve also studied the problem of analyzing the sensitivity of the reliability characteristics of the system at hand to the shape of input distributions. The obtained formulas showed the presence of an explicit dependence of these characteristics on the types of distribution functions of the repair time of the system's elements. However, numerical studies and graphical analysis have shown that this dependence becomes vanishingly small with the “fast” restoration of the system's element

    Automatic phased mission system reliability model generation

    Get PDF
    There are many methods for modelling the reliability of systems based on component failure data. This task becomes more complex as systems increase in size, or undertake missions that comprise multiple discrete modes of operation, or phases. Existing techniques require certain levels of expertise in the model generation and calculation processes, meaning that risk and reliability assessments of systems can often be expensive and time-consuming. This is exacerbated as system complexity increases. This thesis presents a novel method which generates reliability models for phasedmission systems, based on Petri nets, from simple input files. The process has been automated with a piece of software designed for engineers with little or no experience in the field of risk and reliability. The software can generate models for both repairable and non-repairable systems, allowing redundant components and maintenance cycles to be included in the model. Further, the software includes a simulator for the generated models. This allows a user with simple input files to perform automatic model generation and simulation with a single piece of software, yielding detailed failure data on components, phases, missions and the overall system. A system can also be simulated across multiple consecutive missions. To assess performance, the software is compared with an analytical approach and found to match within 5% in both the repairable and non-repairable cases. The software documented in this thesis could serve as an aid to engineers designing new systems to validate the reliability of the system. This would not require specialist consultants or additional software, ensuring that the analysis provides results in a timely and cost-effective manner

    Markov and Semi-markov Chains, Processes, Systems and Emerging Related Fields

    Get PDF
    This book covers a broad range of research results in the field of Markov and Semi-Markov chains, processes, systems and related emerging fields. The authors of the included research papers are well-known researchers in their field. The book presents the state-of-the-art and ideas for further research for theorists in the fields. Nonetheless, it also provides straightforwardly applicable results for diverse areas of practitioners

    Novel models and algorithms for systems reliability modeling and optimization

    Get PDF
    Recent growth in the scale and complexity of products and technologies in the defense and other industries is challenging product development, realization, and sustainment costs. Uncontrolled costs and routine budget overruns are causing all parties involved to seek lean product development processes and treatment of reliability, availability, and maintainability of the system as a true design parameter . To this effect, accurate estimation and management of the system reliability of a design during the earliest stages of new product development is not only critical for managing product development and manufacturing costs but also to control life cycle costs (LCC). In this regard, the overall objective of this research study is to develop an integrated framework for design for reliability (DFR) during upfront product development by treating reliability as a design parameter. The aim here is to develop the theory, methods, and tools necessary for: 1) accurate assessment of system reliability and availability and 2) optimization of the design to meet system reliability targets. In modeling the system reliability and availability, we aim to address the limitations of existing methods, in particular the Markov chains method and the Dynamic Bayesian Network approach, by incorporating a Continuous Time Bayesian Network framework for more effective modeling of sub-system/component interactions, dependencies, and various repair policies. We also propose a multi-object optimization scheme to aid the designer in obtaining optimal design(s) with respect to system reliability/availability targets and other system design requirements. In particular, the optimization scheme would entail optimal selection of sub-system and component alternatives. The theory, methods, and tools to be developed will be extensively tested and validated using simulation test-bed data and actual case studies from our industry partners

    RELIABILITY MODEL AND ASSESSMENT OF REDUNDANT ARRAYS OF INEXPENSIVE DISKS (RAID) INCORPORATING LATENT DEFECTS AND NON-HOMOGENEOUS POISSON PROCESS EVENTS.

    Get PDF
    Today's most reliable data storage systems are made of redundant arrays of inexpensive disks (RAID). The quantification of RAID system reliability is often based on models that omit critical hard disk drive failure modes, assume all failure and restoration rates are constant (exponential distributions), and assume the RAID group times to failure follow a homogeneous Poisson process (HPP). This paper presents a comprehensive reliability model that accounts for numerous failure causes for today's hard disk drives, allows proper representation of repair and restoration, and does not rely on the assumption of a HPP for the RAID group. The model does not assume hard disk drives have constant transition rates, but allows each hard disk drive "slot" in the RAID group to have its own set of distributions, closed form or user defined. Hard disk drive (HDD) failure distributions derived from field usage are presented, showing that failure distributions are commonly non-homogeneous, frequently having increasing hazard rates from time zero. Hard disks drive failure modes and causes are presented and used to develop a model that reflects not only complete failure, but also degraded conditions due to undetected, but corrupted data (latent defects). The model can represent user defined distributions for completion of "background scrubbing" to correct (remove) corrupted data. Sequential Monte Carlo simulation is used to determine the number of double disk failures expected as a function of time. RAID group can be any size up to 25. The results are presented as mean cumulative failure distributions for the RAID group. Results estimate the number of double disk failures can be as much as 5000 times greater than that predicted over 10 years when using the mean time to data loss method or Markov models when the characteristic lives of the input distributions is the same. Model results are compared to actual field data for two HDD families and two different RAID group sizes and show good correlation. Results show the rate of occurrence of failure for the RAID group may be increasing, decreasing or constant depending on the parameters used for the four input distributions

    HiRel: Hybrid Automated Reliability Predictor (HARP) integrated reliability tool system, (version 7.0). Volume 1: HARP introduction and user's guide

    Get PDF
    The Hybrid Automated Reliability Predictor (HARP) integrated Reliability (HiRel) tool system for reliability/availability prediction offers a toolbox of integrated reliability/availability programs that can be used to customize the user's application in a workstation or nonworkstation environment. HiRel consists of interactive graphical input/output programs and four reliability/availability modeling engines that provide analytical and simulative solutions to a wide host of reliable fault-tolerant system architectures and is also applicable to electronic systems in general. The tool system was designed to be compatible with most computing platforms and operating systems, and some programs have been beta tested, within the aerospace community for over 8 years. Volume 1 provides an introduction to the HARP program. Comprehensive information on HARP mathematical models can be found in the references

    Reliability studies of a high-power proton accelerator for accelerator-driven system applications for nuclear waste transmutation

    Get PDF
    The main effort of the present study is to analyze the availability and reliability of a high-performance linac (linear accelerator) conceived for Accelerator-Driven Systems (ADS) purpose and to suggest recommendations, in order both to meet the high operability goals and to satisfy the safety requirements dictated by the reactor system. Reliability Block Diagrams (RBD) approach has been considered for system modelling, according to the present level of definition of the design: component failure modes are assessed in terms of Mean Time Between Failure (MTBF) and Mean Time To Repair (MTTR), reliability and availability figures are derived, applying the current reliability algorithms. The lack of a well-established component database has been pointed out as the main issue related to the accelerator reliability assessment. The results, affected by the conservative character of the study, show a high margin for the improvement in terms of accelerator reliability and availability figures prediction. The paper outlines the viable path towards the accelerator reliability and availability enhancement process and delineates the most proper strategies. The improvement in the reliability characteristics along this path is shown as well

    Systems reliability modelling for phased missions with maintenance-free operating periods

    Get PDF
    In 1996, a concept was proposed by the UK Ministry of Defence with the intention of making the field of reliability more useful to the end user, particularly within the field of military aerospace. This idea was the Maintenance Free Operating Period (MFOP), a duration of time in which the overall system can complete all of its required missions without the need to undergo emergency repairs or maintenance, with a defined probability of success. The system can encounter component or subsystem failures, but these must be carried with no effect to the overall mission, until such time as repair takes place. It is thought that advanced technologies such as redundant systems, prognostics and diagnostics will play a major role in the successful use of MFOP in practical applications. Many types of system operate missions that are made up of several sequential phases. For a mission to be successful, the system must satisfactorily complete each of the objectives in each of the phases. If the system fails or cannot complete its goals in any one phase, the mission has failed. Each phase will require the system to use different items, and so the failure logic changes from phase to phase. Mission unreliability is defined as the probability that the system fails to function successfully during at least one phase of the mission. An important problem is the efficient calculation of the value of mission unreliability. This thesis investigates the creation of a modelling method to consider as many features of systems undergoing both MFOPs and phased missions as possible. This uses Petri nets, a type of digraph allowing storage and transit of tokens which represent system states. A simple model is presented, following which, a more complex model is developed and explained, encompassing those ideas which are believed to be important in delivering a long MFOP with a high degree of confidence. A demonstration of the process by which the modelling method could be used to improve the reliability performance of a large system is then shown. The complex model is employed in the form of a Monte-Carlo simulation program, which is applied to a life-size system such as may be encountered in the real world. Improvements are suggested and results from their implementation analysed.EThOS - Electronic Theses Online ServiceGBUnited Kingdo

    Resilience of an embedded architecture using hardware redundancy

    Get PDF
    In the last decade the dominance of the general computing systems market has being replaced by embedded systems with billions of units manufactured every year. Embedded systems appear in contexts where continuous operation is of utmost importance and failure can be profound. Nowadays, radiation poses a serious threat to the reliable operation of safety-critical systems. Fault avoidance techniques, such as radiation hardening, have been commonly used in space applications. However, these components are expensive, lag behind commercial components with regards to performance and do not provide 100% fault elimination. Without fault tolerant mechanisms, many of these faults can become errors at the application or system level, which in turn, can result in catastrophic failures. In this work we study the concepts of fault tolerance and dependability and extend these concepts providing our own definition of resilience. We analyse the physics of radiation-induced faults, the damage mechanisms of particles and the process that leads to computing failures. We provide extensive taxonomies of 1) existing fault tolerant techniques and of 2) the effects of radiation in state-of-the-art electronics, analysing and comparing their characteristics. We propose a detailed model of faults and provide a classification of the different types of faults at various levels. We introduce an algorithm of fault tolerance and define the system states and actions necessary to implement it. We introduce novel hardware and system software techniques that provide a more efficient combination of reliability, performance and power consumption than existing techniques. We propose a new element of the system called syndrome that is the core of a resilient architecture whose software and hardware can adapt to reliable and unreliable environments. We implement a software simulator and disassembler and introduce a testing framework in combination with ERA’s assembler and commercial hardware simulators
    • …
    corecore