21,864 research outputs found

    DeSyRe: on-Demand System Reliability

    No full text
    The DeSyRe project builds on-demand adaptive and reliable Systems-on-Chips (SoCs). As fabrication technology scales down, chips are becoming less reliable, thereby incurring increased power and performance costs for fault tolerance. To make matters worse, power density is becoming a significant limiting factor in SoC design, in general. In the face of such changes in the technological landscape, current solutions for fault tolerance are expected to introduce excessive overheads in future systems. Moreover, attempting to design and manufacture a totally defect and fault-free system, would impact heavily, even prohibitively, the design, manufacturing, and testing costs, as well as the system performance and power consumption. In this context, DeSyRe delivers a new generation of systems that are reliable by design at well-balanced power, performance, and design costs. In our attempt to reduce the overheads of fault-tolerance, only a small fraction of the chip is built to be fault-free. This fault-free part is then employed to manage the remaining fault-prone resources of the SoC. The DeSyRe framework is applied to two medical systems with high safety requirements (measured using the IEC 61508 functional safety standard) and tight power and performance constraints

    Testing effort dependent software reliability model for imperfect debugging process considering both detection and correction

    Get PDF
    This paper studies the fault detection process (FDP) and fault correction process (FCP) with the incorporation of testing effort function and imperfect debugging. In order to ensure high reliability, it is essential for software to undergo a testing phase, during which faults can be detected and corrected by debuggers. The testing resource allocation during this phase, which is usually depicted by the testing effort function, considerably influences not only the fault detection rate but also the time to correct a detected fault. In addition, testing is usually far from perfect such that new faults may be introduced. In this paper, we first show how to incorporate testing effort function and fault introduction into FDP and then develop FCP as delayed FDP with a correction effort. Various specific paired FDP and FCP models are obtained based on different assumptions of fault introduction and correction effort. An illustrative example is presented. The optimal release policy under different criteria is also discussed

    Experiments in fault tolerant software reliability

    Get PDF
    The reliability of voting was evaluated in a fault-tolerant software system for small output spaces. The effectiveness of the back-to-back testing process was investigated. Version 3.0 of the RSDIMU-ATS, a semi-automated test bed for certification testing of RSDIMU software, was prepared and distributed. Software reliability estimation methods based on non-random sampling are being studied. The investigation of existing fault-tolerance models was continued and formulation of new models was initiated

    Assessment team report on flight-critical systems research at NASA Langley Research Center

    Get PDF
    The quality, coverage, and distribution of effort of the flight-critical systems research program at NASA Langley Research Center was assessed. Within the scope of the Assessment Team's review, the research program was found to be very sound. All tasks under the current research program were at least partially addressing the industry needs. General recommendations made were to expand the program resources to provide additional coverage of high priority industry needs, including operations and maintenance, and to focus the program on an actual hardware and software system that is under development

    Machine learning and its applications in reliability analysis systems

    Get PDF
    In this thesis, we are interested in exploring some aspects of Machine Learning (ML) and its application in the Reliability Analysis systems (RAs). We begin by investigating some ML paradigms and their- techniques, go on to discuss the possible applications of ML in improving RAs performance, and lastly give guidelines of the architecture of learning RAs. Our survey of ML covers both levels of Neural Network learning and Symbolic learning. In symbolic process learning, five types of learning and their applications are discussed: rote learning, learning from instruction, learning from analogy, learning from examples, and learning from observation and discovery. The Reliability Analysis systems (RAs) presented in this thesis are mainly designed for maintaining plant safety supported by two functions: risk analysis function, i.e., failure mode effect analysis (FMEA) ; and diagnosis function, i.e., real-time fault location (RTFL). Three approaches have been discussed in creating the RAs. According to the result of our survey, we suggest currently the best design of RAs is to embed model-based RAs, i.e., MORA (as software) in a neural network based computer system (as hardware). However, there are still some improvement which can be made through the applications of Machine Learning. By implanting the 'learning element', the MORA will become learning MORA (La MORA) system, a learning Reliability Analysis system with the power of automatic knowledge acquisition and inconsistency checking, and more. To conclude our thesis, we propose an architecture of La MORA
    corecore