197 research outputs found

    Towards automatic Markov reliability modeling of computer architectures

    Get PDF
    The analysis and evaluation of reliability measures using time-varying Markov models is required for Processor-Memory-Switch (PMS) structures that have competing processes such as standby redundancy and repair, or renewal processes such as transient or intermittent faults. The task of generating these models is tedious and prone to human error due to the large number of states and transitions involved in any reasonable system. Therefore model formulation is a major analysis bottleneck, and model verification is a major validation problem. The general unfamiliarity of computer architects with Markov modeling techniques further increases the necessity of automating the model formulation. This paper presents an overview of the Automated Reliability Modeling (ARM) program, under development at NASA Langley Research Center. ARM will accept as input a description of the PMS interconnection graph, the behavior of the PMS components, the fault-tolerant strategies, and the operational requirements. The output of ARM will be the reliability of availability Markov model formulated for direct use by evaluation programs. The advantages of such an approach are (a) utility to a large class of users, not necessarily expert in reliability analysis, and (b) a lower probability of human error in the computation

    A general graphical user interface for automatic reliability modeling

    Get PDF
    Reported here is a general Graphical User Interface (GUI) for automatic reliability modeling of Processor Memory Switch (PMS) structures using a Markov model. This GUI is based on a hierarchy of windows. One window has graphical editing capabilities for specifying the system's communication structure, hierarchy, reconfiguration capabilities, and requirements. Other windows have field texts, popup menus, and buttons for specifying parameters and selecting actions. An example application of the GUI is given

    Problems related to the integration of fault tolerant aircraft electronic systems

    Get PDF
    Problems related to the design of the hardware for an integrated aircraft electronic system are considered. Taxonomies of concurrent systems are reviewed and a new taxonomy is proposed. An informal methodology intended to identify feasible regions of the taxonomic design space is described. Specific tools are recommended for use in the methodology. Based on the methodology, a preliminary strawman integrated fault tolerant aircraft electronic system is proposed. Next, problems related to the programming and control of inegrated aircraft electronic systems are discussed. Issues of system resource management, including the scheduling and allocation of real time periodic tasks in a multiprocessor environment, are treated in detail. The role of software design in integrated fault tolerant aircraft electronic systems is discussed. Conclusions and recommendations for further work are included

    Dependability Models for Designing Disaster Tolerant Cloud Computing Systems

    Get PDF
    Abstract—Hundreds of natural disasters occur in many parts of the world every year, causing billions of dollars in damages. This fact contrasts with the high availability requirement of cloud computing systems, and, to protect such systems from unforeseen catastrophe, a recovery plan requires the utilization of different data centers located far enough apart. However, the time to migrate a VM from a data center to another increases due to distance. This work presents dependability models for evaluating distributed cloud computing systems deployed into multiple data centers considering disaster occurrence. Additionally, we present a case study which evaluates several scenarios with different VM migration times and distances between data centers. Keywords-cloud computing; dependability evaluation; stochastic Petri nets; I

    Automatic specification of reliability models for fault-tolerant computers

    Get PDF
    The calculation of reliability measures using Markov models is required for life-critical processor-memory-switch structures that have standby redundancy or that are subject to transient or intermittent faults or repair. The task of specifying these models is tedious and prone to human error because of the large number of states and transitions required in any reasonable system. Therefore, model specification is a major analysis bottleneck, and model verification is a major validation problem. The general unfamiliarity of computer architects with Markov modeling techniques further increases the necessity of automating the model specification. Automation requires a general system description language (SDL). For practicality, this SDL should also provide a high level of abstraction and be easy to learn and use. The first attempt to define and implement an SDL with those characteristics is presented. A program named Automated Reliability Modeling (ARM) was constructed as a research vehicle. The ARM program uses a graphical interface as its SDL, and it outputs a Markov reliability model specification formulated for direct use by programs that generate and evaluate the model

    Model-based sensitivity analysis of IaaS cloud availability

    Get PDF
    The increasing shift of various critical services towards Infrastructure-as-a-Service (IaaS) cloud data centers (CDCs) creates a need for analyzing CDCs’ availability, which is affected by various factors including repair policy and system parameters. This paper aims to apply analytical modeling and sensitivity analysis techniques to investigate the impact of these factors on the availability of a large-scale IaaS CDC, which (1) consists of active and two kinds of standby physical machines (PMs), (2) allows PM moving among active and two kinds of standby PM pools, and (3) allows active and two kinds of standby PMs to have different mean repair times. Two repair policies are considered: (P1) all pools share a repair station and (P2) each pool uses its own repair station. We develop monolithic availability models for each repair policy by using Stochastic Reward Nets and also develop the corresponding scalable two-level models in order to overcome the monolithic model''s limitations, caused by the large-scale feature of a CDC and the complicated interactions among CDC components. We also explore how to apply differential sensitivity analysis technique to conduct parametric sensitivity analysis in the case of interacting sub-models. Numerical results of monolithic models and simulation results are used to verify the approximate accuracy of interacting sub-models, which are further applied to examine the sensitivity of the large-scale CDC availability with respect to repair policy and system parameters

    Automatic phased mission system reliability model generation

    Get PDF
    There are many methods for modelling the reliability of systems based on component failure data. This task becomes more complex as systems increase in size, or undertake missions that comprise multiple discrete modes of operation, or phases. Existing techniques require certain levels of expertise in the model generation and calculation processes, meaning that risk and reliability assessments of systems can often be expensive and time-consuming. This is exacerbated as system complexity increases. This thesis presents a novel method which generates reliability models for phasedmission systems, based on Petri nets, from simple input files. The process has been automated with a piece of software designed for engineers with little or no experience in the field of risk and reliability. The software can generate models for both repairable and non-repairable systems, allowing redundant components and maintenance cycles to be included in the model. Further, the software includes a simulator for the generated models. This allows a user with simple input files to perform automatic model generation and simulation with a single piece of software, yielding detailed failure data on components, phases, missions and the overall system. A system can also be simulated across multiple consecutive missions. To assess performance, the software is compared with an analytical approach and found to match within 5% in both the repairable and non-repairable cases. The software documented in this thesis could serve as an aid to engineers designing new systems to validate the reliability of the system. This would not require specialist consultants or additional software, ensuring that the analysis provides results in a timely and cost-effective manner

    Reliability, Availability and Maintainability (RAM) Analysis for Offshore High Pressure Compressor

    Get PDF
    Reliability, Availability and Maintainability (RAM) helps in optimizing performance of equipment. The availability can be improved by the enhancement of the reliability and maintainability. Equipment failure in offshore facilities are difficult to be predicted hence sudden failure of an equipment lead to reduction in output, loss of production and high maintenance cost due to unplanned maintenance. This study examined and analysed the failure mode of high pressure compressor at offshore platform in order to identify its critical failure mode. Failure and repair data are utilized to determine reliability and maintainability of the high pressure compressor. Reliability and maintainability analysis was carried out with the aid of Reliasoft Weibull++ software to obtain the required parameters while ReliaSoft BlockSim software was used for reliability block diagram (RBD) construction and simulation to obtain the availability of the high pressure compressor. The developed model can improve the performance of the high pressure compressor since it is validated with the actual model. From this RAM analysis, the overall performance of high pressure compressor can be increase by conducting Root Cause Failure Analysis (RCFA) which focusing on the most critical failure mode. The optimization of maintenance schedule can lead to the reduction of maintenance cost

    Two probabilistic life-cycle maintenance models for the deteriorating pavement

    Get PDF
    © 2018, Polish Academy of Sciences Branch Lublin. All rights reserved. Pavement maintenance management poses a significant challenge for highway agencies in terms of pavement deterioration over time and limited financial resources to keep the road condition at an acceptable level. In this paper two probabilistic maintenance models are proposed and compared for pavement deterioration and maintenance processes to evaluate different maintenance strategies. Firstly, the states of pavement condition are defined using the features of different pavement maintenance works, instead of using the traditional method of cumulative service index rating. Secondly, a Markovian model is presented to describe the pavement deterioration and maintenance process with some constraints on the number of interventions, the effect of interventions and etc. But for the complex scenarios, such as non-Markovian deterioration, dependencies between the different types of interventions and the usage of emergency maintenance for roads when the required budget for maintenance is unavailable, a simulation-based Petri-net model is built up to investigate the whole life-cycle evolution. Two examples are used to illustrate and compare the proposed models to demonstrate the merits and disadvantages of each model and its applicable conditions
    • …
    corecore