13,292 research outputs found

    A Low-Cost FPGA-Based Test and Diagnosis Architecture for SRAMs

    Get PDF
    The continues improvement of manufacturing technologies allows the realization of integrated circuits containing an ever increasing number of transistors. A major part of these devices is devoted to realize SRAM blocks. Test and diagnosis of SRAM circuits are therefore an important challenge for improving quality of next generation integrated circuits. This paper proposes a flexible platform for testing and diagnosis of SRAM circuits. The architecture is based on the use of a low cost FPGA based board allowing high diagnosability while keeping costs at a very low leve

    Optimal discrimination between transient and permanent faults

    Get PDF
    An important practical problem in fault diagnosis is discriminating between permanent faults and transient faults. In many computer systems, the majority of errors are due to transient faults. Many heuristic methods have been used for discriminating between transient and permanent faults; however, we have found no previous work stating this decision problem in clear probabilistic terms. We present an optimal procedure for discriminating between transient and permanent faults, based on applying Bayesian inference to the observed events (correct and erroneous results). We describe how the assessed probability that a module is permanently faulty must vary with observed symptoms. We describe and demonstrate our proposed method on a simple application problem, building the appropriate equations and showing numerical examples. The method can be implemented as a run-time diagnosis algorithm at little computational cost; it can also be used to evaluate any heuristic diagnostic procedure by compariso

    Online Fault Classification in HPC Systems through Machine Learning

    Full text link
    As High-Performance Computing (HPC) systems strive towards the exascale goal, studies suggest that they will experience excessive failure rates. For this reason, detecting and classifying faults in HPC systems as they occur and initiating corrective actions before they can transform into failures will be essential for continued operation. In this paper, we propose a fault classification method for HPC systems based on machine learning that has been designed specifically to operate with live streamed data. We cast the problem and its solution within realistic operating constraints of online use. Our results show that almost perfect classification accuracy can be reached for different fault types with low computational overhead and minimal delay. We have based our study on a local dataset, which we make publicly available, that was acquired by injecting faults to an in-house experimental HPC system.Comment: Accepted for publication at the Euro-Par 2019 conferenc

    Experimental evaluation of two software countermeasures against fault attacks

    Get PDF
    Injection of transient faults can be used as a way to attack embedded systems. On embedded processors such as microcontrollers, several studies showed that such a transient fault injection with glitches or electromagnetic pulses could corrupt either the data loads from the memory or the assembly instructions executed by the circuit. Some countermeasure schemes which rely on temporal redundancy have been proposed to handle this issue. Among them, several schemes add this redundancy at assembly instruction level. In this paper, we perform a practical evaluation for two of those countermeasure schemes by using a pulsed electromagnetic fault injection process on a 32-bit microcontroller. We provide some necessary conditions for an efficient implementation of those countermeasure schemes in practice. We also evaluate their efficiency and highlight their limitations. To the best of our knowledge, no experimental evaluation of the security of such instruction-level countermeasure schemes has been published yet.Comment: 6 pages, 2014 IEEE International Symposium on Hardware-Oriented Security and Trust (HOST), Arlington : United States (2014

    Fault detection and diagnosis of a plastic film extrusion process

    Get PDF
    This paper presents a new approach to the design of a model-based fault detection and diagnosis system for application to a plastic film extrusion process. The design constructs a residual generator via parity relations. A multi-objective optimisation problem must be solved in order for the residual to be sensitive to faults but insensitive to disturbances and modelling errors. In this paper, we exploit a genetic algorithm for solving this multi-objective optimisation problem and the resulting fault detection and diagnosis system is applied to a first-principles model of a plastic film extrusion process. Simulation results demonstrate that various types of faults can be detected and diagnosed successfully

    A two-level structure for advanced space power system automation

    Get PDF
    The tasks to be carried out during the three-year project period are: (1) performing extensive simulation using existing mathematical models to build a specific knowledge base of the operating characteristics of space power systems; (2) carrying out the necessary basic research on hierarchical control structures, real-time quantitative algorithms, and decision-theoretic procedures; (3) developing a two-level automation scheme for fault detection and diagnosis, maintenance and restoration scheduling, and load management; and (4) testing and demonstration. The outlines of the proposed system structure that served as a master plan for this project, work accomplished, concluding remarks, and ideas for future work are also addressed

    An On-line BIST RAM Architecture with Self Repair Capabilities

    Get PDF
    The emerging field of self-repair computing is expected to have a major impact on deployable systems for space missions and defense applications, where high reliability, availability, and serviceability are needed. In this context, RAM (random access memories) are among the most critical components. This paper proposes a built-in self-repair (BISR) approach for RAM cores. The proposed design, introducing minimal and technology-dependent overheads, can detect and repair a wide range of memory faults including: stuck-at, coupling, and address faults. The test and repair capabilities are used on-line, and are completely transparent to the external user, who can use the memory without any change in the memory-access protocol. Using a fault-injection environment that can emulate the occurrence of faults inside the module, the effectiveness of the proposed architecture in terms of both fault detection and repairing capability was verified. Memories of various sizes have been considered to evaluate the area-overhead introduced by this proposed architectur
    corecore