14 research outputs found

    System Reliability Evaluation Based on Convex Combination Considering Operation and Maintenance Strategy

    Get PDF
    The approaches to the system reliability evaluation with respect to the cases, where the components are independent or the components have interactive relationships within the system, were proposed in this paper. Starting from the higher requirements on system operational safety and economy, the reliability focused optimal models of multiobjective maintenance strategies were built. For safety-critical systems, the pessimistic maintenance strategies are usually taken, and, in these cases, the system reliability evaluation has also to be tackled pessimistically. For safety-uncritical systems, the optimistic maintenance strategies were usually taken, and, in these circumstances, the system reliability evaluation had also to be tackled optimistically, respectively. Besides, the reasonable maintenance strategies and their corresponding reliability evaluation can be obtained through the convex combination of the above two cases. With a high-speed train system as the example background, the proposed method is verified by combining the actual failure data with the maintenance data. Results demonstrate that the proposed study can provide a new system reliability calculation method and solution to select and optimize the multiobjective operational strategies with the considerations of system safety and economical requirements. The theoretical basis is also provided for scientifically estimating the reliability of a high-speed train system and formulating reasonable maintenance strategies

    Uncertainty and Confidence in Safety Logic

    Get PDF
    Abstract Reasoning about system safety requires reasoning about confidence in safety claims. For example, DO-178B requires developers to determine the correctness of the worst-case execution time of the software. It is not possible to do this beyond any doubt. Therefore, developers and assessors must consider the limitations of execution time evidence and their effect on the confidence that can be placed in execution time figures, timing analysis results, and claims to have met timing-related software safety requirements. In this paper, we survey and assess five existing concepts that might serve as means of describing and reasoning about confidence: safety integrity levels, probability distributions of failure rates, Bayesian Belief Networks, argument integrity levels, and Baconian probability. We define use cases for confidence in safety cases, prescriptive standards, certification of component-based systems, and the reuse of safety elements both in and out of context. From these use cases, we derive requirements for a confidence framework. We assess existing techniques by discussing what is known about how well each confidence metric meets these requirements. Our results show that no existing confidence metric is ideally suited for all uses. We conclude by discussing implications for future standards and for reuse of safety elements

    An Investigation of Proposed Techniques for Quantifying Confidence in Assurance Arguments

    Get PDF
    The use of safety cases in certification raises the question of assurance argument sufficiency and the issue of confidence (or uncertainty) in the argument's claims. Some researchers propose to model confidence quantitatively and to calculate confidence in argument conclusions. We know of little evidence to suggest that any proposed technique would deliver trustworthy results when implemented by system safety practitioners. Proponents do not usually assess the efficacy of their techniques through controlled experiment or historical study. Instead, they present an illustrative example where the calculation delivers a plausible result. In this paper, we review current proposals, claims made about them, and evidence advanced in favor of them. We then show that proposed techniques can deliver implausible results in some cases. We conclude that quantitative confidence techniques require further validation before they should be recommended as part of the basis for deciding whether an assurance argument justifies fielding a critical system

    Operating System Support for Redundant Multithreading

    Get PDF
    Failing hardware is a fact and trends in microprocessor design indicate that the fraction of hardware suffering from permanent and transient faults will continue to increase in future chip generations. Researchers proposed various solutions to this issue with different downsides: Specialized hardware components make hardware more expensive in production and consume additional energy at runtime. Fault-tolerant algorithms and libraries enforce specific programming models on the developer. Compiler-based fault tolerance requires the source code for all applications to be available for recompilation. In this thesis I present ASTEROID, an operating system architecture that integrates applications with different reliability needs. ASTEROID is built on top of the L4/Fiasco.OC microkernel and extends the system with Romain, an operating system service that transparently replicates user applications. Romain supports single- and multi-threaded applications without requiring access to the application's source code. Romain replicates applications and their resources completely and thereby does not rely on hardware extensions, such as ECC-protected memory. In my thesis I describe how to efficiently implement replication as a form of redundant multithreading in software. I develop mechanisms to manage replica resources and to make multi-threaded programs behave deterministically for replication. I furthermore present an approach to handle applications that use shared-memory channels with other programs. My evaluation shows that Romain provides 100% error detection and more than 99.6% error correction for single-bit flips in memory and general-purpose registers. At the same time, Romain's execution time overhead is below 14% for single-threaded applications running in triple-modular redundant mode. The last part of my thesis acknowledges that software-implemented fault tolerance methods often rely on the correct functioning of a certain set of hardware and software components, the Reliable Computing Base (RCB). I introduce the concept of the RCB and discuss what constitutes the RCB of the ASTEROID system and other fault tolerance mechanisms. Thereafter I show three case studies that evaluate approaches to protecting RCB components and thereby aim to achieve a software stack that is fully protected against hardware errors

    Boomerang: real-time I/O meets legacy systems

    Full text link
    This paper presents Boomerang, an I/O system that integrates a legacy non-real-time OS with one that is customized for timing-sensitive tasks. A relatively small RTOS benefits from the pre-existing libraries, drivers and services of the legacy system. Additionally, timing-critical tasks are isolated from less critical tasks by securely partitioning machine resources among the separate OSes. Boomerang guarantees end-to-end processing delays on input data that requires outputs to be generated within specific time bounds.We show how to construct composable task pipelines in Boomerang that combine functionality spanning a custom RTOS and a legacy Linux system. By dedicating time-critical I/O to the RTOS, we ensure that complementary services provided by Linux are sufficiently predictable to meet end-to-end service guarantees. While Boomerang benefits from spatial isolation, it also outperforms a standalone Linux system using deadline-based CPU reservations for pipeline tasks. We also show how Boomerang outperforms a virtualized system called ACRN, designed for automotive systems.https://ieeexplore.ieee.org/document/9113119Published versio

    Aspect-oriented technology for dependable operating systems

    Get PDF
    Modern computer devices exhibit transient hardware faults that disturb the electrical behavior but do not cause permanent physical damage to the devices. Transient faults are caused by a multitude of sources, such as fluctuation of the supply voltage, electromagnetic interference, and radiation from the natural environment. Therefore, dependable computer systems must incorporate methods of fault tolerance to cope with transient faults. Software-implemented fault tolerance represents a promising approach that does not need expensive hardware redundancy for reducing the probability of failure to an acceptable level. This thesis focuses on software-implemented fault tolerance for operating systems because they are the most critical pieces of software in a computer system: All computer programs depend on the integrity of the operating system. However, the C/C++ source code of common operating systems tends to be already exceedingly complex, so that a manual extension by fault tolerance is no viable solution. Thus, this thesis proposes a generic solution based on Aspect-Oriented Programming (AOP). To evaluate AOP as a means to improve the dependability of operating systems, this thesis presents the design and implementation of a library of aspect-oriented fault-tolerance mechanisms. These mechanisms constitute separate program modules that can be integrated automatically into common off-the-shelf operating systems using a compiler for the AOP language. Thus, the aspect-oriented approach facilitates improving the dependability of large-scale software systems without affecting the maintainability of the source code. The library allows choosing between several error-detection and error-correction schemes, and provides wait-free synchronization for handling asynchronous and multi-threaded operating-system code. This thesis evaluates the aspect-oriented approach to fault tolerance on the basis of two off-the-shelf operating systems. Furthermore, the evaluation also considers one user-level program for protection, as the library of fault-tolerance mechanisms is highly generic and transparent and, thus, not limited to operating systems. Exhaustive fault-injection experiments show an excellent trade-off between runtime overhead and fault tolerance, which can be adjusted and optimized by fine-grained selective placement of the fault-tolerance mechanisms. Finally, this thesis provides evidence for the effectiveness of the approach in detecting and correcting radiation-induced hardware faults: High-energy particle radiation experiments confirm improvements in fault tolerance by almost 80 percent

    Safety and Reliability - Safe Societies in a Changing World

    Get PDF
    The contributions cover a wide range of methodologies and application areas for safety and reliability that contribute to safe societies in a changing world. These methodologies and applications include: - foundations of risk and reliability assessment and management - mathematical methods in reliability and safety - risk assessment - risk management - system reliability - uncertainty analysis - digitalization and big data - prognostics and system health management - occupational safety - accident and incident modeling - maintenance modeling and applications - simulation for safety and reliability analysis - dynamic risk and barrier management - organizational factors and safety culture - human factors and human reliability - resilience engineering - structural reliability - natural hazards - security - economic analysis in risk managemen
    corecore