1,266 research outputs found

    Reliability and maintainability assessment factors for reliable fault-tolerant systems

    Get PDF
    A long term goal of the NASA Langley Research Center is the development of a reliability assessment methodology of sufficient power to enable the credible comparison of the stochastic attributes of one ultrareliable system design against others. This methodology, developed over a 10 year period, is a combined analytic and simulative technique. An analytic component is the Computer Aided Reliability Estimation capability, third generation, or simply CARE III. A simulative component is the Gate Logic Software Simulator capability, or GLOSS. The numerous factors that potentially have a degrading effect on system reliability and the ways in which these factors that are peculiar to highly reliable fault tolerant systems are accounted for in credible reliability assessments. Also presented are the modeling difficulties that result from their inclusion and the ways in which CARE III and GLOSS mitigate the intractability of the heretofore unworkable mathematics

    Advanced reliability modeling of fault-tolerant computer-based systems

    Get PDF
    Two methodologies for the reliability assessment of fault tolerant digital computer based systems are discussed. The computer-aided reliability estimation 3 (CARE 3) and gate logic software simulation (GLOSS) are assessment technologies that were developed to mitigate a serious weakness in the design and evaluation process of ultrareliable digital systems. The weak link is based on the unavailability of a sufficiently powerful modeling technique for comparing the stochastic attributes of one system against others. Some of the more interesting attributes are reliability, system survival, safety, and mission success

    Integrated analysis of error detection and recovery

    Get PDF
    An integrated modeling and analysis of error detection and recovery is presented. When fault latency and/or error latency exist, the system may suffer from multiple faults or error propagations which seriously deteriorate the fault-tolerant capability. Several detection models that enable analysis of the effect of detection mechanisms on the subsequent error handling operations and the overall system reliability were developed. Following detection of the faulty unit and reconfiguration of the system, the contaminated processes or tasks have to be recovered. The strategies of error recovery employed depend on the detection mechanisms and the available redundancy. Several recovery methods including the rollback recovery are considered. The recovery overhead is evaluated as an index of the capabilities of the detection and reconfiguration mechanisms

    Towards automatic Markov reliability modeling of computer architectures

    Get PDF
    The analysis and evaluation of reliability measures using time-varying Markov models is required for Processor-Memory-Switch (PMS) structures that have competing processes such as standby redundancy and repair, or renewal processes such as transient or intermittent faults. The task of generating these models is tedious and prone to human error due to the large number of states and transitions involved in any reasonable system. Therefore model formulation is a major analysis bottleneck, and model verification is a major validation problem. The general unfamiliarity of computer architects with Markov modeling techniques further increases the necessity of automating the model formulation. This paper presents an overview of the Automated Reliability Modeling (ARM) program, under development at NASA Langley Research Center. ARM will accept as input a description of the PMS interconnection graph, the behavior of the PMS components, the fault-tolerant strategies, and the operational requirements. The output of ARM will be the reliability of availability Markov model formulated for direct use by evaluation programs. The advantages of such an approach are (a) utility to a large class of users, not necessarily expert in reliability analysis, and (b) a lower probability of human error in the computation

    Dynamic assertion testing of flight control software

    Get PDF
    Digital Flight Control System (DFCS) software was used as a test case for assertion testing. The assertions were written and embedded in the code, then errors were inserted (seeded) one at a time and the code executed. Results indicate that assertion testing is an effective and efficient method of detecting errors in flight software. Most errors are eliminate at an earlier stage in the development than before

    Improvement of the Fault Tolerance in IoT Based Positioning Systems by Applying for Redundancy in the Controller Layer

    Get PDF
                   في السنوات الأخيرة ، ازداد انتشار تطبيقات تحديد المواقع للأنظمة القائمة على إنترنت الأشياء (IoT) بشكل متزايد ، ووجدت استخدامات في تتبع الأنشطة اليومية للأطفال والمسنين وتتبع المركبات. من وجهة نظر واحدة ، قد تحتوي البيانات التي تم الحصول عليها من الأنظمة القائمة على نظام تحديد المواقع العالمي (GPS) على خطأ ، مع مراعاة هذه العوامل ، تعتمد الطريقة المقترحة لهذه الدراسة على تطبيق تحديد المواقع القائم على إنترنت الأشياء واستبدال استخدام إنترنت الأشياء بدلاً من نظام تحديد المواقع العالمي (GPS). ومع ذلك ، لا يمكن أن يكون هذا سببًا لعدم استخدام نظام تحديد المواقع العالمي (GPS) ، ولتعزيز الموثوقية ، يمكننا تطبيق مجموعة متوازية من النظام الحديث والأساليب التقليدية في وقت واحد. على الرغم من أنه يمكن الوصول إلى إشارات نظام تحديد المواقع العالمي (GPS) فقط في الأماكن المفتوحة ، فإن أجهزة نظام تحديد المواقع العالمي (GPS) معرضة للخطأ في المقام الأول عندما يكون جهاز الاستقبال موجودًا في منطقة مدنية ، بسبب الازدحام والتداخل المحتمل. تقدم النتيجة نموذجًا قائمًا على التكرار لتحسين تحمل الأخطاء لأنظمة تحديد المواقع القائمة على إنترنت الأشياء. تُظهر نتائج المحاكاة تحسنًا بنسبة 22.5٪ في تحمل الأخطاء لنظام تحديد المواقع القائم على إنترنت الأشياء بعد تطبيق آلية التحقق المقترحة وتحسين 77.4٪ في هذا التسامح بعد التقدم للحصول على تكرار أكثر تكلفة للوحدة النمطية.  In recent years, the positioning applications of Internet-of-Things (IoT) based systems have grown increasingly popular, and are found to be useful in tracking the daily activities of children, the elderly and vehicle tracking. It can be argued that the data obtained from GPS based systems may contain error, hence taking these factors into account, the proposed method for this study is based on the application of IoT-based positioning and the replacement of using IoT instead of GPS.  This cannot, however, be a reason for not using the GPS, and in order to enhance the reliability, a parallel combination of the modern system and traditional methods simultaneously can be applied. Although GPS signals can only be accessed in open spaces, GPS devices are error-prone primarily when the receiver is located in an urban-canyons area, due to congestion and the possible interference. The outcome presents a redundancy-based model for improving the fault tolerance of IoT-based positioning systems. The simulation results show a 22.5% improvement in the fault tolerance of the IoT-based positioning system after applying the proposed validation mechanism, and a 77.4% improvement in this tolerance after applying for a more expensive module redundancy

    Study of a unified hardware and software fault-tolerant architecture

    Get PDF
    A unified architectural concept, called the Fault Tolerant Processor Attached Processor (FTP-AP), that can tolerate hardware as well as software faults is proposed for applications requiring ultrareliable computation capability. An emulation of the FTP-AP architecture, consisting of a breadboard Motorola 68010-based quadruply redundant Fault Tolerant Processor, four VAX 750s as attached processors, and four versions of a transport aircraft yaw damper control law, is used as a testbed in the AIRLAB to examine a number of critical issues. Solutions of several basic problems associated with N-Version software are proposed and implemented on the testbed. This includes a confidence voter to resolve coincident errors in N-Version software. A reliability model of N-Version software that is based upon the recent understanding of software failure mechanisms is also developed. The basic FTP-AP architectural concept appears suitable for hosting N-Version application software while at the same time tolerating hardware failures. Architectural enhancements for greater efficiency, software reliability modeling, and N-Version issues that merit further research are identified

    Fault tolerant architectures for integrated aircraft electronics systems, task 2

    Get PDF
    The architectural basis for an advanced fault tolerant on-board computer to succeed the current generation of fault tolerant computers is examined. The network error tolerant system architecture is studied with particular attention to intercluster configurations and communication protocols, and to refined reliability estimates. The diagnosis of faults, so that appropriate choices for reconfiguration can be made is discussed. The analysis relates particularly to the recognition of transient faults in a system with tasks at many levels of priority. The demand driven data-flow architecture, which appears to have possible application in fault tolerant systems is described and work investigating the feasibility of automatic generation of aircraft flight control programs from abstract specifications is reported

    DeSyRe: on-Demand System Reliability

    No full text
    The DeSyRe project builds on-demand adaptive and reliable Systems-on-Chips (SoCs). As fabrication technology scales down, chips are becoming less reliable, thereby incurring increased power and performance costs for fault tolerance. To make matters worse, power density is becoming a significant limiting factor in SoC design, in general. In the face of such changes in the technological landscape, current solutions for fault tolerance are expected to introduce excessive overheads in future systems. Moreover, attempting to design and manufacture a totally defect and fault-free system, would impact heavily, even prohibitively, the design, manufacturing, and testing costs, as well as the system performance and power consumption. In this context, DeSyRe delivers a new generation of systems that are reliable by design at well-balanced power, performance, and design costs. In our attempt to reduce the overheads of fault-tolerance, only a small fraction of the chip is built to be fault-free. This fault-free part is then employed to manage the remaining fault-prone resources of the SoC. The DeSyRe framework is applied to two medical systems with high safety requirements (measured using the IEC 61508 functional safety standard) and tight power and performance constraints
    corecore