Search CORE

5,932 research outputs found

Validation Methods Research for Fault-Tolerant Avionics and Control Systems: Working Group Meeting, 2

Author: Clary J. B.
Gault J. W.
Trivedi K. S.
Publication venue
Publication date
Field of study

The validation process comprises the activities required to insure the agreement of system realization with system specification. A preliminary validation methodology for fault tolerant systems documented. A general framework for a validation methodology is presented along with a set of specific tasks intended for the validation of two specimen system, SIFT and FTMP. Two major areas of research are identified. First, are those activities required to support the ongoing development of the validation process itself, and second, are those activities required to support the design, development, and understanding of fault tolerant systems

NASA Technical Reports Server

Assessing the Reliability of Diverse Fault-Tolerant Systems

Author: Littlewood B.
Popov P. T.
Strigini L.
Publication venue
Publication date: 01/01/2000
Field of study

Design diversity between redundant channels is a way of improving the dependability of software-based systems, but it does not alleviate the difficulties of dependability assessment

CiteSeerX

City Research Online

Fault tolerant architectures for integrated aircraft electronics systems

Author: Levitt K. N.
Melliar-Smith P. M.
Schwartz R. L.
Publication venue
Publication date
Field of study

Work into possible architectures for future flight control computer systems is described. Ada for Fault-Tolerant Systems, the NETS Network Error-Tolerant System architecture, and voting in asynchronous systems are covered

NASA Technical Reports Server

Reliability analysis of triple modular redundancy system with spare

Author: Al-Kofahi Khalid A.
Publication venue: RIT Scholar Works
Publication date: 01/12/1993
Field of study

Hardware redundant fault-tolerant systems and the different design approaches are discussed. The reliability analysis of fault-tolerant systems is usually done under permanent fault conditions. With statistical data suggesting that up to 90% of system failures are caused by intermittent faults, the reliability analysis of fault-tolerant systems must concentrate more on this class of faults. In this work, a reconfigurable Triple Modular Redundancy (TMR) with spare system that differentiates between permanent and intermittent faults has been built. The reconfiguration process of this system depends on both the current status of its modules and their history. Based on this, a different approach for reliability analysis under intermittent fault conditions using Markov models is presented. This approach shows a much higher system reliability compared to other redundant and non-redundant configurations

RIT Scholar Works

Reliability and maintainability assessment factors for reliable fault-tolerant systems

Author: Bavuso S. J.
Publication venue
Publication date: 01/03/1984
Field of study

A long term goal of the NASA Langley Research Center is the development of a reliability assessment methodology of sufficient power to enable the credible comparison of the stochastic attributes of one ultrareliable system design against others. This methodology, developed over a 10 year period, is a combined analytic and simulative technique. An analytic component is the Computer Aided Reliability Estimation capability, third generation, or simply CARE III. A simulative component is the Gate Logic Software Simulator capability, or GLOSS. The numerous factors that potentially have a degrading effect on system reliability and the ways in which these factors that are peculiar to highly reliable fault tolerant systems are accounted for in credible reliability assessments. Also presented are the modeling difficulties that result from their inclusion and the ways in which CARE III and GLOSS mitigate the intractability of the heretofore unworkable mathematics

NASA Technical Reports Server

Trends in reliability modeling technology for fault tolerant systems

Author: Bavuso S. J.
Publication venue
Publication date
Field of study

Reliability modeling for fault tolerant avionic computing systems was developed. The modeling of large systems involving issues of state size and complexity, fault coverage, and practical computation was discussed. A novel technique which provides the tool for studying the reliability of systems with nonconstant failure rates is presented. The fault latency which may provide a method of obtaining vital latent fault data is measured

NASA Technical Reports Server

Formal Configuration of Fault-Tolerant Systems

Author: Herrmann Linda
Publication venue
Publication date: 28/05/2019
Field of study

Bit flips are known to be a source of strange system behavior, failures, and crashes. They can cause dramatic financial loss, security breaches, or even harm human life. Caused by energized particles arising from, e.g., cosmic rays or heat, they are hardly avoidable. Due to transistor sizes becoming smaller and smaller, modern hardware becomes more and more prone to bit flips. This yields a high scientific interest, and many techniques to make systems more resilient against bit flips are developed. Fault-tolerance techniques are techniques that detect and react to bit flips or their effects. Before using these techniques, they typically need to be configured for the particular system they shall protect, the grade of resilience that shall be achieved, and the environment. State-of-the-art configuration approaches have a high risk of being imprecise, of being affected by undesired side effects, and of yielding questionable resilience measures. In this thesis we encourage the usage of formal methods for resiliency configuration, point out advantages and investigate difficulties. We exemplarily investigate two systems that are equipped with fault-tolerance techniques, and we apply parametric variants of probabilistic model checking to obtain optimal configurations for pre-defined resilience criteria. Probabilistic model checking is an automated formal method that operates on Markov models, i.e., state-based models with probabilistic transitions, where costs or rewards can be assigned to states and transitions. Probabilistic model checking can be used to compute, e.g., the probability of having a failure, the conditional probability of detecting an error in case of bit-flip occurrence, or the overhead that arises due to error detection and correction. Parametric variants of probabilistic model checking allow parameters in the transition probabilities and in the costs and rewards. Instead of computing values for probabilities and overhead, parametric variants compute rational functions. These functions can then be analyzed for optimality. The considered fault-tolerant systems are inspired by the work of project partners. The first system is an inter-process communication protocol as it is used in the Fiasco.OC microkernel. The communication structures provided by the kernel are protected against bit flips by a fault-tolerance technique. The second system is inspired by the redo-based fault-tolerance technique \haft. This technique protects an application against bit flips by partitioning the application's instruction flow into transaction, adding redundance, and redoing single transactions in case of error detection. Driven by these examples, we study challenges when using probabilistic model checking for fault-tolerance configuration and present solutions. We show that small transition probabilities, as they arise in error models, can be a cause of previously known accuracy issues, when using numeric solver in probabilistic model checking. We argue that the use of non-iterative methods is an acceptable alternative. We debate on the usability of the rational functions for finding optimal configurations, and show that for relatively short rational functions the usage of mathematical methods is appropriate. The redo-based fault-tolerance model suffers from the well-known state-explosion problem. We present a new technique, counter-based factorization, that tackles this problem for system models that do not scale because of a counter, as it is the case for this fault-tolerance model. This technique utilizes the chain-like structure that arises from the counter, splits the model into several parts, and computes local characteristics (in terms of rational functions) for these parts. These local characteristics can then be combined to retrieve global resiliency and overhead measures. The rational functions retrieved for the redo-based fault-tolerance model are huge - for small model instances they already have the size of more than one gigabyte. We therefor can not apply precise mathematic methods to these functions. Instead, we use the short, matrix-based representation, that arises from factorization, to point-wise evaluate the functions. Using this approach, we systematically explore the design space of the redo-based fault-tolerance model and retrieve sweet-spot configurations

Technische Universität Dresden: Qucosa

Evaluation of reliability modeling tools for advanced fault tolerant systems

Author: Baker Robert
Scheper Charlotte
Publication venue
Publication date
Field of study

The Computer Aided Reliability Estimation (CARE III) and Automated Reliability Interactice Estimation System (ARIES 82) reliability tools for application to advanced fault tolerance aerospace systems were evaluated. To determine reliability modeling requirements, the evaluation focused on the Draper Laboratories' Advanced Information Processing System (AIPS) architecture as an example architecture for fault tolerance aerospace systems. Advantages and limitations were identified for each reliability evaluation tool. The CARE III program was designed primarily for analyzing ultrareliable flight control systems. The ARIES 82 program's primary use was to support university research and teaching. Both CARE III and ARIES 82 were not suited for determining the reliability of complex nodal networks of the type used to interconnect processing sites in the AIPS architecture. It was concluded that ARIES was not suitable for modeling advanced fault tolerant systems. It was further concluded that subject to some limitations (the difficulty in modeling systems with unpowered spare modules, systems where equipment maintenance must be considered, systems where failure depends on the sequence in which faults occurred, and systems where multiple faults greater than a double near coincident faults must be considered), CARE III is best suited for evaluating the reliability of advanced tolerant systems for air transport

NASA Technical Reports Server