13,292 research outputs found
A Low-Cost FPGA-Based Test and Diagnosis Architecture for SRAMs
The continues improvement of manufacturing technologies allows the realization of integrated circuits containing an ever increasing number of transistors. A major part of these devices is devoted to realize SRAM blocks. Test and diagnosis of SRAM circuits are therefore an important challenge for improving quality of next generation integrated circuits. This paper proposes a flexible platform for testing and diagnosis of SRAM circuits. The architecture is based on the use of a low cost FPGA based board allowing high diagnosability while keeping costs at a very low leve
Optimal discrimination between transient and permanent faults
An important practical problem in fault diagnosis is discriminating between permanent faults and transient faults. In many computer systems, the majority of errors are due to transient faults. Many heuristic methods have been used for discriminating between transient and permanent faults; however, we have found no previous work stating this decision problem in clear probabilistic terms. We present an optimal procedure for discriminating between transient and permanent faults, based on applying Bayesian inference to the observed events (correct and erroneous results). We describe how the assessed probability that a module is permanently faulty must vary with observed symptoms. We describe and demonstrate our proposed method on a simple application problem, building the appropriate equations and showing numerical examples. The method can be implemented as a run-time diagnosis algorithm at little computational cost; it can also be used to evaluate any heuristic diagnostic procedure by compariso
Online Fault Classification in HPC Systems through Machine Learning
As High-Performance Computing (HPC) systems strive towards the exascale goal,
studies suggest that they will experience excessive failure rates. For this
reason, detecting and classifying faults in HPC systems as they occur and
initiating corrective actions before they can transform into failures will be
essential for continued operation. In this paper, we propose a fault
classification method for HPC systems based on machine learning that has been
designed specifically to operate with live streamed data. We cast the problem
and its solution within realistic operating constraints of online use. Our
results show that almost perfect classification accuracy can be reached for
different fault types with low computational overhead and minimal delay. We
have based our study on a local dataset, which we make publicly available, that
was acquired by injecting faults to an in-house experimental HPC system.Comment: Accepted for publication at the Euro-Par 2019 conferenc
Experimental evaluation of two software countermeasures against fault attacks
Injection of transient faults can be used as a way to attack embedded
systems. On embedded processors such as microcontrollers, several studies
showed that such a transient fault injection with glitches or electromagnetic
pulses could corrupt either the data loads from the memory or the assembly
instructions executed by the circuit. Some countermeasure schemes which rely on
temporal redundancy have been proposed to handle this issue. Among them,
several schemes add this redundancy at assembly instruction level. In this
paper, we perform a practical evaluation for two of those countermeasure
schemes by using a pulsed electromagnetic fault injection process on a 32-bit
microcontroller. We provide some necessary conditions for an efficient
implementation of those countermeasure schemes in practice. We also evaluate
their efficiency and highlight their limitations. To the best of our knowledge,
no experimental evaluation of the security of such instruction-level
countermeasure schemes has been published yet.Comment: 6 pages, 2014 IEEE International Symposium on Hardware-Oriented
Security and Trust (HOST), Arlington : United States (2014
Fault detection and diagnosis of a plastic film extrusion process
This paper presents a new approach to the design of a model-based fault detection and diagnosis system for application to a plastic film extrusion process. The design constructs a residual generator via parity relations. A multi-objective optimisation problem must be solved in order for the residual to be sensitive to faults but insensitive to disturbances and modelling errors. In this paper, we exploit a genetic algorithm for solving this multi-objective optimisation problem and the resulting fault detection and diagnosis system is applied to a first-principles model of a plastic film extrusion process. Simulation results demonstrate that various types of faults can be detected and diagnosed successfully
A two-level structure for advanced space power system automation
The tasks to be carried out during the three-year project period are: (1) performing extensive simulation using existing mathematical models to build a specific knowledge base of the operating characteristics of space power systems; (2) carrying out the necessary basic research on hierarchical control structures, real-time quantitative algorithms, and decision-theoretic procedures; (3) developing a two-level automation scheme for fault detection and diagnosis, maintenance and restoration scheduling, and load management; and (4) testing and demonstration. The outlines of the proposed system structure that served as a master plan for this project, work accomplished, concluding remarks, and ideas for future work are also addressed
An On-line BIST RAM Architecture with Self Repair Capabilities
The emerging field of self-repair computing is expected to have a major impact on deployable systems for space missions and defense applications, where high reliability, availability, and serviceability are needed. In this context, RAM (random access memories) are among the most critical components. This paper proposes a built-in self-repair (BISR) approach for RAM cores. The proposed design, introducing minimal and technology-dependent overheads, can detect and repair a wide range of memory faults including: stuck-at, coupling, and address faults. The test and repair capabilities are used on-line, and are completely transparent to the external user, who can use the memory without any change in the memory-access protocol. Using a fault-injection environment that can emulate the occurrence of faults inside the module, the effectiveness of the proposed architecture in terms of both fault detection and repairing capability was verified. Memories of various sizes have been considered to evaluate the area-overhead introduced by this proposed architectur
- …