419 research outputs found

    The SIFT hardware/software systems. Volume 1: A detailed description

    Get PDF
    This report contains a detailed description of the software implemented fault-tolerant computer's operating system and hardware subsystems. The Software Implemented Fault-Tolerant (SIFT) computer system was developed as an experimental vehicle for fault-tolerant systems research. The SIFT effort began with broad, in-depth studies stating the reliability and processing requirements for digital computers which would, in the aircraft of the 1990's, control flight-critical functions

    A technique for evaluating the application of the pin-level stuck-at fault model to VLSI circuits

    Get PDF
    Accurate fault models are required to conduct the experiments defined in validation methodologies for highly reliable fault-tolerant computers (e.g., computers with a probability of failure of 10 to the -9 for a 10-hour mission). Described is a technique by which a researcher can evaluate the capability of the pin-level stuck-at fault model to simulate true error behavior symptoms in very large scale integrated (VLSI) digital circuits. The technique is based on a statistical comparison of the error behavior resulting from faults applied at the pin-level of and internal to a VLSI circuit. As an example of an application of the technique, the error behavior of a microprocessor simulation subjected to internal stuck-at faults is compared with the error behavior which results from pin-level stuck-at faults. The error behavior is characterized by the time between errors and the duration of errors. Based on this example data, the pin-level stuck-at fault model is found to deliver less than ideal performance. However, with respect to the class of faults which cause a system crash, the pin-level, stuck-at fault model is found to provide a good modeling capability

    Experimental validation of clock synchronization algorithms

    Get PDF
    The objective of this work is to validate mathematically derived clock synchronization theories and their associated algorithms through experiment. Two theories are considered, the Interactive Convergence Clock Synchronization Algorithm and the Midpoint Algorithm. Special clock circuitry was designed and built so that several operating conditions and failure modes (including malicious failures) could be tested. Both theories are shown to predict conservative upper bounds (i.e., measured values of clock skew were always less than the theory prediction). Insight gained during experimentation led to alternative derivations of the theories. These new theories accurately predict the behavior of the clock system. It is found that a 100 percent penalty is paid to tolerate worst-case failures. It is also shown that under optimal conditions (with minimum error and no failures) the clock skew can be as much as three clock ticks. Clock skew grows to six clock ticks when failures are present. Finally, it is concluded that one cannot rely solely on test procedures or theoretical analysis to predict worst-case conditions

    Advanced techniques in reliability model representation and solution

    Get PDF
    The current tendency of flight control system designs is towards increased integration of applications and increased distribution of computational elements. The reliability analysis of such systems is difficult because subsystem interactions are increasingly interdependent. Researchers at NASA Langley Research Center have been working for several years to extend the capability of Markov modeling techniques to address these problems. This effort has been focused in the areas of increased model abstraction and increased computational capability. The reliability model generator (RMG) is a software tool that uses as input a graphical object-oriented block diagram of the system. RMG uses a failure-effects algorithm to produce the reliability model from the graphical description. The ASSURE software tool is a parallel processing program that uses the semi-Markov unreliability range evaluator (SURE) solution technique and the abstract semi-Markov specification interface to the SURE tool (ASSIST) modeling language. A failure modes-effects simulation is used by ASSURE. These tools were used to analyze a significant portion of a complex flight control system. The successful combination of the power of graphical representation, automated model generation, and parallel computation leads to the conclusion that distributed fault-tolerant system architectures can now be analyzed

    Model reduction by trimming for a class of semi-Markov reliability models and the corresponding error bound

    Get PDF
    Semi-Markov processes have proved to be an effective and convenient tool to construct models of systems that achieve reliability by redundancy and reconfiguration. These models are able to depict complex system architectures and to capture the dynamics of fault arrival and system recovery. A disadvantage of this approach is that the models can be extremely large, which poses both a model and a computational problem. Techniques are needed to reduce the model size. Because these systems are used in critical applications where failure can be expensive, there must be an analytically derived bound for the error produced by the model reduction technique. A model reduction technique called trimming is presented that can be applied to a popular class of systems. Automatic model generation programs were written to help the reliability analyst produce models of complex systems. This method, trimming, is easy to implement and the error bound easy to compute. Hence, the method lends itself to inclusion in an automatic model generator

    Fault-tolerant processing system

    Get PDF
    A fault-tolerant, fiber optic interconnect, or backplane, which serves as a via for data transfer between modules. Fault tolerance algorithms are embedded in the backplane by dividing the backplane into a read bus and a write bus and placing a redundancy management unit (RMU) between the read bus and the write bus so that all data transmitted by the write bus is subjected to the fault tolerance algorithms before the data is passed for distribution to the read bus. The RMU provides both backplane control and fault tolerance

    Interrupt-based Phase-locked Frequency Multiplier

    Get PDF
    A method aud system utilize a processor's digital timer and two interrupts to form a frequency multiplier. The first internipt's processing time window is definable by a first uumber of counts C(sub 1), of the digital timer while the second interrupt's processing time window is definable by a second number of counts C(sub 2) of the digital timer. A count value CV utilized by the systedmethod is based on a desired frequency multiplier N(sub 1), the timer clock rate, and the tiole required for one cycle of an input signal. The first interrupt is triggered upon completion of one cycle ofthe input sigual at which point the processing time window associated therewith begins. The second interrupt is triggered each time the timer's overflow signal is generated at which point the processing time window associated with the second interrupt begins. During the occurrence of the second interrupt's processing. the count value CV is modified to maintain the first interrupt's processing time window approximately centered between two of the second internipt's processing time windows

    User's guide to the Reliability Estimation System Testbed (REST)

    Get PDF
    The Reliability Estimation System Testbed is an X-window based reliability modeling tool that was created to explore the use of the Reliability Modeling Language (RML). RML was defined to support several reliability analysis techniques including modularization, graphical representation, Failure Mode Effects Simulation (FMES), and parallel processing. These techniques are most useful in modeling large systems. Using modularization, an analyst can create reliability models for individual system components. The modules can be tested separately and then combined to compute the total system reliability. Because a one-to-one relationship can be established between system components and the reliability modules, a graphical user interface may be used to describe the system model. RML was designed to permit message passing between modules. This feature enables reliability modeling based on a run time simulation of the system wide effects of a component's failure modes. The use of failure modes effects simulation enhances the analyst's ability to correctly express system behavior when using the modularization approach to reliability modeling. To alleviate the computation bottleneck often found in large reliability models, REST was designed to take advantage of parallel processing on hypercube processors

    User's guide to programming fault injection and data acquisition in the SIFT environment

    Get PDF
    Described are the features, command language, and functional design of the SIFT (Software Implemented Fault Tolerance) fault injection and data acquisition interface software. The document is also intended to assist and guide the SIFT user in defining, developing, and executing SIFT fault injection experiments and the subsequent collection and reduction of that fault injection data. It is also intended to be used in conjunction with the SIFT User's Guide (NASA Technical Memorandum 86289) for reference to SIFT system commands, procedures and functions, and overall guidance in SIFT system programming
    corecore