243 research outputs found

    Experimental analysis of computer system dependability

    Get PDF
    This paper reviews an area which has evolved over the past 15 years: experimental analysis of computer system dependability. Methodologies and advances are discussed for three basic approaches used in the area: simulated fault injection, physical fault injection, and measurement-based analysis. The three approaches are suited, respectively, to dependability evaluation in the three phases of a system's life: design phase, prototype phase, and operational phase. Before the discussion of these phases, several statistical techniques used in the area are introduced. For each phase, a classification of research methods or study topics is outlined, followed by discussion of these methods or topics as well as representative studies. The statistical techniques introduced include the estimation of parameters and confidence intervals, probability distribution characterization, and several multivariate analysis methods. Importance sampling, a statistical technique used to accelerate Monte Carlo simulation, is also introduced. The discussion of simulated fault injection covers electrical-level, logic-level, and function-level fault injection methods as well as representative simulation environments such as FOCUS and DEPEND. The discussion of physical fault injection covers hardware, software, and radiation fault injection methods as well as several software and hybrid tools including FIAT, FERARI, HYBRID, and FINE. The discussion of measurement-based analysis covers measurement and data processing techniques, basic error characterization, dependency analysis, Markov reward modeling, software-dependability, and fault diagnosis. The discussion involves several important issues studies in the area, including fault models, fast simulation techniques, workload/failure dependency, correlated failures, and software fault tolerance

    Common spaceborne multicomputer operating system and development environment

    Get PDF
    A preliminary technical specification for a multicomputer operating system is developed. The operating system is targeted for spaceborne flight missions and provides a broad range of real-time functionality, dynamic remote code-patching capability, and system fault tolerance and long-term survivability features. Dataflow concepts are used for representing application algorithms. Functional features are included to ensure real-time predictability for a class of algorithms which require data-driven execution on an iterative steady state basis. The development environment supports the development of algorithm code, design of control parameters, performance analysis, simulation of real-time dataflow applications, and compiling and downloading of the resulting application

    Validation of a fault-tolerant clock synchronization system

    Get PDF
    A validation method for the synchronization subsystem of a fault tolerant computer system is investigated. The method combines formal design verification with experimental testing. The design proof reduces the correctness of the clock synchronization system to the correctness of a set of axioms which are experimentally validated. Since the reliability requirements are often extreme, requiring the estimation of extremely large quantiles, an asymptotic approach to estimation in the tail of a distribution is employed

    Middleware Fault Tolerance Support for the BOSS Embedded Operating System

    Get PDF
    Critical embedded systems need a dependable operating system and application. Despite all efforts to prevent and remove faults in system development, residual software faults usually persist. Therefore, critical systems need some sort of fault tolerance to deal with these faults and also with hardware faults at operation time. This work proposes fault-tolerant support mechanisms for the BOSS embedded operating system, based on the application of proven fault tolerance strategies by middleware control software which transparently delivers the added functionality to the application software. Special attention is taken to complexity control and resource constraints, targeting the needs of the embedded market.Fundação para a Ciência e a Tecnologia (FCT

    Evaluation of fault-tolerant parallel-processor architectures over long space missions

    Get PDF
    The impact of a five year space mission environment on fault-tolerant parallel processor architectures is examined. The target application is a Strategic Defense Initiative (SDI) satellite requiring 256 parallel processors to provide the computation throughput. The reliability requirements are that the system still be operational after five years with .99 probability and that the probability of system failure during one-half hour of full operation be less than 10(-7). The fault tolerance features an architecture must possess to meet these reliability requirements are presented, many potential architectures are briefly evaluated, and one candidate architecture, the Charles Stark Draper Laboratory's Fault-Tolerant Parallel Processor (FTPP) is evaluated in detail. A methodology for designing a preliminary system configuration to meet the reliability and performance requirements of the mission is then presented and demonstrated by designing an FTPP configuration

    Reconfigurable G and C computer study for space station use. Volume 1 - Technical summary Final report, 29 Dec. 1969 - 31 Jan. 1971

    Get PDF
    Technical summary of reconfigurable guidance and control computer for space station application - Vol.

    Study of fault-tolerant software technology

    Get PDF
    Presented is an overview of the current state of the art of fault-tolerant software and an analysis of quantitative techniques and models developed to assess its impact. It examines research efforts as well as experience gained from commercial application of these techniques. The paper also addresses the computer architecture and design implications on hardware, operating systems and programming languages (including Ada) of using fault-tolerant software in real-time aerospace applications. It concludes that fault-tolerant software has progressed beyond the pure research state. The paper also finds that, although not perfectly matched, newer architectural and language capabilities provide many of the notations and functions needed to effectively and efficiently implement software fault-tolerance

    The design and construction of a data path chip set for a fault tolerant parallel processor

    Get PDF
    Thesis (M.S.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1991.Includes bibliographical references (p. 165-167).by Charles E. Sakamaki.M.S

    The Raincore API for clusters of networking elements

    Get PDF
    Clustering technology offers a way to increase overall reliability and performance of Internet information flow by strengthening one link in the chain without adding others. We have implemented this technology in a distributed computing architecture for network elements. The architecture, called Raincore, originated in the Reliable Array of Independent Nodes, or RAIN, research collaboration between the California Institute of Technology and the US National Aeronautics and Space Agency's Jet Propulsion Laboratory. The RAIN project focused on developing high-performance, fault-tolerant, portable clustering technology for spaceborne computing . The technology that emerged from this project became the basis for a spinoff company, Rainfinity, which has the exclusive intellectual property rights to the RAIN technology. The authors describe the Raincore conceptual architecture and distributed services, which are designed to make it easy for developers to port their applications to run on top of a cluster of networking elements. We include two applications: a Web server prototype that was part of the original RAIN research project and a commercial firewall cluster product from Rainfinity

    NASA Formal Methods Workshop, 1990

    Get PDF
    The workshop brought together researchers involved in the NASA formal methods research effort for detailed technical interchange and provided a mechanism for interaction with representatives from the FAA and the aerospace industry. The workshop also included speakers from industry to debrief the formal methods researchers on the current state of practice in flight critical system design, verification, and certification. The goals were: define and characterize the verification problem for ultra-reliable life critical flight control systems and the current state of practice in industry today; determine the proper role of formal methods in addressing these problems, and assess the state of the art and recent progress toward applying formal methods to this area
    corecore