4,588 research outputs found

    Experimental analysis of computer system dependability

    Get PDF
    This paper reviews an area which has evolved over the past 15 years: experimental analysis of computer system dependability. Methodologies and advances are discussed for three basic approaches used in the area: simulated fault injection, physical fault injection, and measurement-based analysis. The three approaches are suited, respectively, to dependability evaluation in the three phases of a system's life: design phase, prototype phase, and operational phase. Before the discussion of these phases, several statistical techniques used in the area are introduced. For each phase, a classification of research methods or study topics is outlined, followed by discussion of these methods or topics as well as representative studies. The statistical techniques introduced include the estimation of parameters and confidence intervals, probability distribution characterization, and several multivariate analysis methods. Importance sampling, a statistical technique used to accelerate Monte Carlo simulation, is also introduced. The discussion of simulated fault injection covers electrical-level, logic-level, and function-level fault injection methods as well as representative simulation environments such as FOCUS and DEPEND. The discussion of physical fault injection covers hardware, software, and radiation fault injection methods as well as several software and hybrid tools including FIAT, FERARI, HYBRID, and FINE. The discussion of measurement-based analysis covers measurement and data processing techniques, basic error characterization, dependency analysis, Markov reward modeling, software-dependability, and fault diagnosis. The discussion involves several important issues studies in the area, including fault models, fast simulation techniques, workload/failure dependency, correlated failures, and software fault tolerance

    A fault-tolerant multiprocessor architecture for aircraft, volume 1

    Get PDF
    A fault-tolerant multiprocessor architecture is reported. This architecture, together with a comprehensive information system architecture, has important potential for future aircraft applications. A preliminary definition and assessment of a suitable multiprocessor architecture for such applications is developed

    Programmable electronic mining systems: best practice recommendations (in nine parts) : part 2, 2.1 System safety

    Get PDF
    "This report (An Introduction to Safety) is the first in a nine-part series of recommendations addressing the functional safety of processor-controlled mining equipment. It is part of a risk-based system safety process encompassing hardware, software, humans and the operating environment for the equipment's life cycle. The reports in this series address the various life cycle stages of inception, design, approval and certification, commissioning, operation, maintenance, and decommissioning. These recommendations were developed as a joint project between the National Institute for Occupational Safety and Health and the Mine Safety and Health Administration. They are intended for use by mining companies, original equipment manufacturers, and aftermarket suppliers to these mining companies. Users of these reports are expected to consider the set in total during the design cycle. 1.0 Safety Introduction - This is an introductory report for the general mining industry. It provides basic system/software safety concepts, discusses the need for mining to address the functional safety of programmable electronics, and includes the benefits of implementing a system/software safety program. 2.1 System Safety and 2.2 Software Safety - These reports draw heavily from International Electrotechnical Commission (IEC) standard 61508 and other recognized standards. The scope is "surface and underground safety mining systems employing embedded, networked, and non-networked programmable electronics." System safety seeks to design safety into all phases of the entire system. Software is a subsystem; thus, software safety is a part of the systems safety. 3.0 Safety File - This report contains the documentation that demonstrates the level of safety built into the system and identifies limitations for the system's use and operation. In essence, it is a "proof of safety" that the system and its operation meets the appropriate level of safety for the intended application. It starts from the beginning of the design, is maintained during the full life cycle of the system, and provides administrative support for the safety program of the full system. 4.0 Safety Assessment - The independent assessment of the Safety file is addressed. It establishes consistent methods to determine the completeness and suitability of safety evidence and justification. This assessment could be done by an independent third party. 5.0 Safety Framework Guidance - It is intended to supplement the safety framework reports with guidance that provides users with additional information. The purpose is to help users in applying the concepts presented. In other words, the safety framework is what needs to be done and the guidance is how it can be done. The guidance information reinforces the concepts, describes various methodologies that can be used and gives examples and references. It also gives information on the benefits and drawbacks of various methodologies. The guidance reports are not intended to promote a single methodology or to be an exhaustive treaty of the subject material. They provide information and references so that the user can more intelligently choose and implement the appropriate methodologies given the user's application and capabilities." - NIOSHTIC-

    Systematic Model-based Design Assurance and Property-based Fault Injection for Safety Critical Digital Systems

    Get PDF
    With advances in sensing, wireless communications, computing, control, and automation technologies, we are witnessing the rapid uptake of Cyber-Physical Systems across many applications including connected vehicles, healthcare, energy, manufacturing, smart homes etc. Many of these applications are safety-critical in nature and they depend on the correct and safe execution of software and hardware that are intrinsically subject to faults. These faults can be design faults (Software Faults, Specification faults, etc.) or physically occurring faults (hardware failures, Single-event-upsets, etc.). Both types of faults must be addressed during the design and development of these critical systems. Several safety-critical industries have widely adopted Model-Based Engineering paradigms to manage the design assurance processes of these complex CPSs. This thesis studies the application of IEC 61508 compliant model-based design assurance methodology on a representative safety-critical digital architecture targeted for the Nuclear power generation facilities. The study presents detailed experiences and results to demonstrate the benefits of Model testing in finding design flaws and its relevance to subsequent verification steps in the workflow. Additionally, to study the impact of physical faults on the digital architecture we develop a novel property-based fault injection method that overcomes few deficiencies of traditional fault injection methods. The model-based fault injection approach presented here guarantees high efficiency and near-exhaustive input/state/fault space coverage, by utilizing formal model checking principles to identify fault activation conditions and prove the fault tolerance features. The fault injection framework facilitates automated integration of fault saboteurs throughout the model to enable exhaustive fault location coverage in the model

    Airborne Advanced Reconfigurable Computer System (ARCS)

    Get PDF
    A digital computer subsystem fault-tolerant concept was defined, and the potential benefits and costs of such a subsystem were assessed when used as the central element of a new transport's flight control system. The derived advanced reconfigurable computer system (ARCS) is a triple-redundant computer subsystem that automatically reconfigures, under multiple fault conditions, from triplex to duplex to simplex operation, with redundancy recovery if the fault condition is transient. The study included criteria development covering factors at the aircraft's operation level that would influence the design of a fault-tolerant system for commercial airline use. A new reliability analysis tool was developed for evaluating redundant, fault-tolerant system availability and survivability; and a stringent digital system software design methodology was used to achieve design/implementation visibility

    Automated Mixed Traffic Vehicle (AMTV) technology and safety study

    Get PDF
    Technology and safety related to the implementation of an Automated Mixed Traffic Vehicle (AMTV) system are discussed. System concepts and technology status were reviewed and areas where further development is needed are identified. Failure and hazard modes were also analyzed and methods for prevention were suggested. The results presented are intended as a guide for further efforts in AMTV system design and technology development for both near term and long term applications. The AMTV systems discussed include a low speed system, and a hybrid system consisting of low speed sections and high speed sections operating in a semi-guideway. The safety analysis identified hazards that may arise in a properly functioning AMTV system, as well as hardware failure modes. Safety related failure modes were emphasized. A risk assessment was performed in order to create a priority order and significant hazards and failure modes were summarized. Corrective measures were proposed for each hazard

    Multilevel Runtime Verification for Safety and Security Critical Cyber Physical Systems from a Model Based Engineering Perspective

    Get PDF
    Advanced embedded system technology is one of the key driving forces behind the rapid growth of Cyber-Physical System (CPS) applications. CPS consists of multiple coordinating and cooperating components, which are often software-intensive and interact with each other to achieve unprecedented tasks. Such highly integrated CPSs have complex interaction failures, attack surfaces, and attack vectors that we have to protect and secure against. This dissertation advances the state-of-the-art by developing a multilevel runtime monitoring approach for safety and security critical CPSs where there are monitors at each level of processing and integration. Given that computation and data processing vulnerabilities may exist at multiple levels in an embedded CPS, it follows that solutions present at the levels where the faults or vulnerabilities originate are beneficial in timely detection of anomalies. Further, increasing functional and architectural complexity of critical CPSs have significant safety and security operational implications. These challenges are leading to a need for new methods where there is a continuum between design time assurance and runtime or operational assurance. Towards this end, this dissertation explores Model Based Engineering methods by which design assurance can be carried forward to the runtime domain, creating a shared responsibility for reducing the overall risk associated with the system at operation. Therefore, a synergistic combination of Verification & Validation at design time and runtime monitoring at multiple levels is beneficial in assuring safety and security of critical CPS. Furthermore, we realize our multilevel runtime monitor framework on hardware using a stream-based runtime verification language

    Automatic instantiation of abstract tests on specific configurations for large critical control systems

    Full text link
    Computer-based control systems have grown in size, complexity, distribution and criticality. In this paper a methodology is presented to perform an abstract testing of such large control systems in an efficient way: an abstract test is specified directly from system functional requirements and has to be instantiated in more test runs to cover a specific configuration, comprising any number of control entities (sensors, actuators and logic processes). Such a process is usually performed by hand for each installation of the control system, requiring a considerable time effort and being an error prone verification activity. To automate a safe passage from abstract tests, related to the so called generic software application, to any specific installation, an algorithm is provided, starting from a reference architecture and a state-based behavioural model of the control software. The presented approach has been applied to a railway interlocking system, demonstrating its feasibility and effectiveness in several years of testing experience

    Software implemented fault tolerance for microprocessor controllers: fault tolerance for microprocessor controllers

    Get PDF
    It is generally accepted that transient faults are a major cause of failure in micro processor systems. Industrial controllers with embedded microprocessors are particularly at risk from this type of failure because their working environments are prone to transient disturbances which can generate transient faults. In order to improve the reliability of processor systems for industrial applications within a limited budget, fault tolerant techniques for uniprocessors are implemented. These techniques aim to identify characteristics of processor operation which are attributed to erroneous behaviour. Once detection is achieved, a programme of restoration activity can be initiated. This thesis initially develops a previous model of erroneous microprocessor behaviour from which characteristics particular to mal-operation are identified. A new technique is proposed, based on software implemented fault tolerance which, by recognizing a particular behavioural characteristic, facilitates the self-detection of erroneous execution. The technique involves inserting detection mechanisms into the target software. This can be quite a complex process and so a prototype software tool called Post-programming Automated Recovery UTility (PARUT) is developed to automate the technique's application. The utility can be used to apply the proposed behavioural fault tolerant technique for a selection of target processors. Fault injection and emulation experiments assess the effectiveness of the proposed fault tolerant technique for three application programs implemented on an 8, 16, and 32- bit processors respectively. The modified application programs are shown to have an improved detection capability and hence reliability when the proposed fault tolerant technique is applied. General assessment of the technique cannot be made, however, because its effectiveness is application specific. The thesis concludes by considering methods of generating non-hazardous application programs at the compilation stage, and design features for incorporation into the architecture of a microprocessor which inherently reduce the hazard, and increase the detection capability of the target software. Particular suggestions are made to add a 'PARUT' phase to the translation process, and to orientate microprocessor design towards the instruction opcode map

    Design of an integrated airframe/propulsion control system architecture

    Get PDF
    The design of an integrated airframe/propulsion control system architecture is described. The design is based on a prevalidation methodology that uses both reliability and performance. A detailed account is given for the testing associated with a subset of the architecture and concludes with general observations of applying the methodology to the architecture
    corecore