3,626 research outputs found

    Experimental analysis of computer system dependability

    Get PDF
    This paper reviews an area which has evolved over the past 15 years: experimental analysis of computer system dependability. Methodologies and advances are discussed for three basic approaches used in the area: simulated fault injection, physical fault injection, and measurement-based analysis. The three approaches are suited, respectively, to dependability evaluation in the three phases of a system's life: design phase, prototype phase, and operational phase. Before the discussion of these phases, several statistical techniques used in the area are introduced. For each phase, a classification of research methods or study topics is outlined, followed by discussion of these methods or topics as well as representative studies. The statistical techniques introduced include the estimation of parameters and confidence intervals, probability distribution characterization, and several multivariate analysis methods. Importance sampling, a statistical technique used to accelerate Monte Carlo simulation, is also introduced. The discussion of simulated fault injection covers electrical-level, logic-level, and function-level fault injection methods as well as representative simulation environments such as FOCUS and DEPEND. The discussion of physical fault injection covers hardware, software, and radiation fault injection methods as well as several software and hybrid tools including FIAT, FERARI, HYBRID, and FINE. The discussion of measurement-based analysis covers measurement and data processing techniques, basic error characterization, dependency analysis, Markov reward modeling, software-dependability, and fault diagnosis. The discussion involves several important issues studies in the area, including fault models, fast simulation techniques, workload/failure dependency, correlated failures, and software fault tolerance

    Advanced information processing system for advanced launch system: Avionics architecture synthesis

    Get PDF
    The Advanced Information Processing System (AIPS) is a fault-tolerant distributed computer system architecture that was developed to meet the real time computational needs of advanced aerospace vehicles. One such vehicle is the Advanced Launch System (ALS) being developed jointly by NASA and the Department of Defense to launch heavy payloads into low earth orbit at one tenth the cost (per pound of payload) of the current launch vehicles. An avionics architecture that utilizes the AIPS hardware and software building blocks was synthesized for ALS. The AIPS for ALS architecture synthesis process starting with the ALS mission requirements and ending with an analysis of the candidate ALS avionics architecture is described

    Composing Graph Theory and Deep Neural Networks to Evaluate SEU Type Soft Error Effects

    Full text link
    Rapidly shrinking technology node and voltage scaling increase the susceptibility of Soft Errors in digital circuits. Soft Errors are radiation-induced effects while the radiation particles such as Alpha, Neutrons or Heavy Ions, interact with sensitive regions of microelectronic devices/circuits. The particle hit could be a glancing blow or a penetrating strike. A well apprehended and characterized way of analyzing soft error effects is the fault-injection campaign, but that typically acknowledged as time and resource-consuming simulation strategy. As an alternative to traditional fault injection-based methodologies and to explore the applicability of modern graph based neural network algorithms in the field of reliability modeling, this paper proposes a systematic framework that explores gate-level abstractions to extract and exploit relevant feature representations at low-dimensional vector space. The framework allows the extensive prediction analysis of SEU type soft error effects in a given circuit. A scalable and inductive type representation learning algorithm on graphs called GraphSAGE has been utilized for efficiently extracting structural features of the gate-level netlist, providing a valuable database to exercise a downstream machine learning or deep learning algorithm aiming at predicting fault propagation metrics. Functional Failure Rate (FFR): the predicted fault propagating metric of SEU type fault within the gate-level circuit abstraction of the 10-Gigabit Ethernet MAC (IEEE 802.3) standard circuit.Comment: 5 pages for conference, Number of figures: 3, Conference: 2020 9th Mediterranean Conference on Embedded Computing (MECO

    inSense: A Variation and Fault Tolerant Architecture for Nanoscale Devices

    Get PDF
    Transistor technology scaling has been the driving force in improving the size, speed, and power consumption of digital systems. As devices approach atomic size, however, their reliability and performance are increasingly compromised due to reduced noise margins, difficulties in fabrication, and emergent nano-scale phenomena. Scaled CMOS devices, in particular, suffer from process variations such as random dopant fluctuation (RDF) and line edge roughness (LER), transistor degradation mechanisms such as negative-bias temperature instability (NBTI) and hot-carrier injection (HCI), and increased sensitivity to single event upsets (SEUs). Consequently, future devices may exhibit reduced performance, diminished lifetimes, and poor reliability. This research proposes a variation and fault tolerant architecture, the inSense architecture, as a circuit-level solution to the problems induced by the aforementioned phenomena. The inSense architecture entails augmenting circuits with introspective and sensory capabilities which are able to dynamically detect and compensate for process variations, transistor degradation, and soft errors. This approach creates ``smart\u27\u27 circuits able to function despite the use of unreliable devices and is applicable to current CMOS technology as well as next-generation devices using new materials and structures. Furthermore, this work presents an automated prototype implementation of the inSense architecture targeted to CMOS devices and is evaluated via implementation in ISCAS \u2785 benchmark circuits. The automated prototype implementation is functionally verified and characterized: it is found that error detection capability (with error windows from ≈\approx30-400ps) can be added for less than 2\% area overhead for circuits of non-trivial complexity. Single event transient (SET) detection capability (configurable with target set-points) is found to be functional, although it generally tracks the standard DMR implementation with respect to overheads

    Fault tolerant architectures for integrated aircraft electronics systems, task 2

    Get PDF
    The architectural basis for an advanced fault tolerant on-board computer to succeed the current generation of fault tolerant computers is examined. The network error tolerant system architecture is studied with particular attention to intercluster configurations and communication protocols, and to refined reliability estimates. The diagnosis of faults, so that appropriate choices for reconfiguration can be made is discussed. The analysis relates particularly to the recognition of transient faults in a system with tasks at many levels of priority. The demand driven data-flow architecture, which appears to have possible application in fault tolerant systems is described and work investigating the feasibility of automatic generation of aircraft flight control programs from abstract specifications is reported

    Design of an integrated airframe/propulsion control system architecture

    Get PDF
    The design of an integrated airframe/propulsion control system architecture is described. The design is based on a prevalidation methodology that uses both reliability and performance. A detailed account is given for the testing associated with a subset of the architecture and concludes with general observations of applying the methodology to the architecture

    SIRU development. Volume 1: System development

    Get PDF
    A complete description of the development and initial evaluation of the Strapdown Inertial Reference Unit (SIRU) system is reported. System development documents the system mechanization with the analytic formulation for fault detection and isolation processing structure; the hardware redundancy design and the individual modularity features; the computational structure and facilities; and the initial subsystem evaluation results
    • …
    corecore