36 research outputs found
An efficient fault syndromes simulator for SRAM memories
Testing and diagnosis techniques play a key role in the advance of semiconductor memory technology. The challenge of failure detection has created intensive investigation on efficient testing and diagnosis algorithm for better fault coverage and diagnostic resolution. At present, March test algorithm is used to detect and diagnose all faults related to Random Access Memories. However, the test and diagnosis process are mainly done manually. Due to this, a systematic approach for developing and evaluating memory test algorithm is required. This work is focused on incorporating the March based test algorithm using a software simulator tool for implementing a fast and systematic memory testing algorithm. The simulator allows a user through a GUI to select a March based test algorithm depending on the desired fault coverage and diagnostic resolution. Experimental results show that using the simulator for testing is more efficient than that of the traditional testing algorithm. This new simulator makes it possible for a detailed list of stuck-at faults, transition faults and coupling faults covered by each algorithm and its percentage to be displayed after a set of test algorithms has been chosen. The percentage of diagnostic resolution is also displayed. This proves that the simulator reduces the trade-off between test time, fault coverage and diagnostic resolution. Moreover, the chosen algorithm can be applied to incorporate with memory built-in self-test and diagnosis, to have a better fault coverage and diagnostic resolution. Universities and industry involved in memory Built- in-Self test, Built-in-Self repair and Built-in-Self diagnose will benefit by saving a few years on researching an efficient algorithm to be implemented in their designs
A fault syndromes simulator for random access memories
Testing and diagnosis techniques play a key role in the advance of semiconductor memory technology. The challenge of failure detection has created intensive investigation on efficient testing and diagnosis algorithm for better fault coverage and diagnostic resolution. At present, the March test algorithm is used to detect and diagnose all faults related to Random Access Memories. This algorithm also allows the faults to be located and identified. However, the test and diagnosis process is mainly done manually. Due to this, a systematic approach for developing and evaluating memory test algorithm is required. This work is focused on incorporating the March based test algorithm using a software simulator tool for implementing a fast and systematic memory testing algorithm. The simulator allows a user through a GUI to select a March based test algorithm depending on the desired fault coverage and diagnostic resolution. Experimental results show that using the simulator for testing is more efficient than that of the traditional testing algorithm. This new simulator makes it possible for a detailed list of coupling faults and stuck-at faults covered by each algorithm and its percentage to be displayed after a set of test algorithms has been chosen. The percentage of diagnostic resolution is also displayed. This proves that the simulator reduces the trade-off between test time, fault coverage and diagnostic resolution. Moreover, the chosen algorithm can be applied to incorporate with memory built-in self-test and diagnosis, to have a better fault coverage and diagnostic resolution. Universities and industry involved in memory Built-in-Self test, Built-in-Self repair and Built-in-Self diagnose will benefit by saving a few years on finding an efficient algorithm to be implemented in their designs
Improving Error Correction Codes for Multiple-Cell Upsets in Space Applications
© 2018 IEEE. Personal use of this material is permitted. Permissíon from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertisíng or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.[EN] Currently, faults suffered by SRAM memory
systems have increased due to the aggressive CMOS integration
density. Thus, the probability of occurrence of single-cell
upsets (SCUs) or multiple-cell upsets (MCUs) augments. One
of the main causes of MCUs in space applications is cosmic
radiation. A common solution is the use of error correction
codes (ECCs). Nevertheless, when using ECCs in space applications,
they must achieve a good balance between error coverage
and redundancy, and their encoding/decoding circuits must be
efficient in terms of area, power, and delay. Different codes have
been proposed to tolerate MCUs. For instance, Matrix codes use
Hamming codes and parity checks in a bi-dimensional layout to
correct and detect some patterns of MCUs. Recently presented,
column¿line¿code (CLC) has been designed to tolerate MCUs
in space applications. CLC is a modified Matrix code, based
on extended Hamming codes and parity checks. Nevertheless,
a common property of these codes is the high redundancy
introduced. In this paper, we present a series of new lowredundant
ECCs able to correct MCUs with reduced area, power,
and delay overheads. Also, these new codes maintain, or even
improve, memory error coverage with respect to Matrix and
CLC codes.This work was supported by the Spanish Government under the research Project TIN2016-81075-R.Gracia-Morán, J.; Saiz-Adalid, L.; Gil Tomás, DA.; Gil, P. (2018). Improving Error Correction Codes for Multiple-Cell Upsets in Space Applications. IEEE Transactions on Very Large Scale Integration (VLSI) Systems. 26(10):2132-2142. https://doi.org/10.1109/TVLSI.2018.2837220S21322142261
Fault Detection with Optimum March Test Algorithm
This paper presents a research work aimed to detect previously-undetected faults, either Write Disturb Faults (WDFs) or Deceptive Read Destructive Faults (DRDFs) or both in March Algorithm such as MATS++(6N), March C-(10N), March SR(14N), and March CL(12N). The main focus of this research is to improve fault coverage on Single Cell Faults as well as Static Double Cell Faults detection, using specified test algorithm. Transition Coupling Faults (CFtrs), Write Destructive Coupling Faults (CFwds) and Deceptive Read Destructive Faults (CFdrds) are types of faults mainly used in this research. The experiment result published in [1] shows BIST (Built-In-Self-Test) implementation with the new algorithm. It provides the same test length but with bigger area overhead, we therefore proposed a new 14N March Test Algorithm with fault coverage of more than 95% using solid 0s and 1s Data Background (DB). This paper reveals the design methodology to generate DB covers all memories function by applying non-transition data, transition data, and single read and double read data. The automation hardware was designed to give the flexibility to the user to generate other new March Algorithm prior to the selected algorithm and analyzed the performance in terms of fault detection and power consumption
Innovative Techniques for Testing and Diagnosing SoCs
We rely upon the continued functioning of many electronic devices for our everyday welfare,
usually embedding integrated circuits that are becoming even cheaper and smaller
with improved features. Nowadays, microelectronics can integrate a working computer
with CPU, memories, and even GPUs on a single die, namely System-On-Chip (SoC).
SoCs are also employed on automotive safety-critical applications, but need to be tested
thoroughly to comply with reliability standards, in particular the ISO26262 functional
safety for road vehicles.
The goal of this PhD. thesis is to improve SoC reliability by proposing innovative
techniques for testing and diagnosing its internal modules: CPUs, memories, peripherals,
and GPUs. The proposed approaches in the sequence appearing in this thesis are described
as follows:
1. Embedded Memory Diagnosis: Memories are dense and complex circuits which
are susceptible to design and manufacturing errors. Hence, it is important to understand
the fault occurrence in the memory array. In practice, the logical and physical
array representation differs due to an optimized design which adds enhancements to
the device, namely scrambling. This part proposes an accurate memory diagnosis
by showing the efforts of a software tool able to analyze test results, unscramble
the memory array, map failing syndromes to cell locations, elaborate cumulative
analysis, and elaborate a final fault model hypothesis. Several SRAM memory failing
syndromes were analyzed as case studies gathered on an industrial automotive
32-bit SoC developed by STMicroelectronics. The tool displayed defects virtually,
and results were confirmed by real photos taken from a microscope.
2. Functional Test Pattern Generation: The key for a successful test is the pattern applied
to the device. They can be structural or functional; the former usually benefits
from embedded test modules targeting manufacturing errors and is only effective
before shipping the component to the client. The latter, on the other hand, can be
applied during mission minimally impacting on performance but is penalized due
to high generation time. However, functional test patterns may benefit for having
different goals in functional mission mode. Part III of this PhD thesis proposes
three different functional test pattern generation methods for CPU cores embedded
in SoCs, targeting different test purposes, described as follows:
a. Functional Stress Patterns: Are suitable for optimizing functional stress during
I
Operational-life Tests and Burn-in Screening for an optimal device reliability
characterization
b. Functional Power Hungry Patterns: Are suitable for determining functional
peak power for strictly limiting the power of structural patterns during manufacturing
tests, thus reducing premature device over-kill while delivering high test
coverage
c. Software-Based Self-Test Patterns: Combines the potentiality of structural patterns
with functional ones, allowing its execution periodically during mission.
In addition, an external hardware communicating with a devised SBST was proposed.
It helps increasing in 3% the fault coverage by testing critical Hardly
Functionally Testable Faults not covered by conventional SBST patterns.
An automatic functional test pattern generation exploiting an evolutionary algorithm
maximizing metrics related to stress, power, and fault coverage was employed
in the above-mentioned approaches to quickly generate the desired patterns. The
approaches were evaluated on two industrial cases developed by STMicroelectronics;
8051-based and a 32-bit Power Architecture SoCs. Results show that generation
time was reduced upto 75% in comparison to older methodologies while
increasing significantly the desired metrics.
3. Fault Injection in GPGPU: Fault injection mechanisms in semiconductor devices
are suitable for generating structural patterns, testing and activating mitigation techniques,
and validating robust hardware and software applications. GPGPUs are
known for fast parallel computation used in high performance computing and advanced
driver assistance where reliability is the key point. Moreover, GPGPU manufacturers
do not provide design description code due to content secrecy. Therefore,
commercial fault injectors using the GPGPU model is unfeasible, making radiation
tests the only resource available, but are costly. In the last part of this thesis, we
propose a software implemented fault injector able to inject bit-flip in memory elements
of a real GPGPU. It exploits a software debugger tool and combines the
C-CUDA grammar to wisely determine fault spots and apply bit-flip operations in
program variables. The goal is to validate robust parallel algorithms by studying
fault propagation or activating redundancy mechanisms they possibly embed. The
effectiveness of the tool was evaluated on two robust applications: redundant parallel
matrix multiplication and floating point Fast Fourier Transform
A Multi-level Approach to Evaluate the Impact of GPU Permanent Faults on CNN's Reliability
Graphics processing units (GPUs) are widely used to accelerate Artificial Intelligence applications, such as those based on Convolutional Neural Networks (CNNs). Since in some domains in which CNNs are heavily employed (e.g., automotive and robotics) the expected lifetime of GPUs is over ten years, it is of paramount importance to study the impact of permanent faults (e.g. due to aging). Crucially, while the impact of transient faults on GPUs running CNNs has been widely studied, an accurate evaluation of the impact of permanent faults is still lacking. Performing this evaluation is challenging due to the complexity of GPU devices and the software implementing a CNN. In this work, we propose a methodology that combines the accuracy of gate-level fault simulation with the speed and flexibility of software fault injection to evaluate the effects of permanent hardware faults affecting a GPU. First, we profile the executed low-level GPU instructions during the CNN inference. Then, using extensive gate-level fault injection campaigns, we provide an accurate analysis of the effects of permanent faults on the internal modules executing the targeted instructions. Finally, we propagate these effects using fast software-based fault injection. The method allows, for the first time, to estimate the percentage of permanent faults leading the CNN to produce wrong results (i.e., changing the result of its work). The method's feasibility, which allows for flexibly trade-off accuracy with the required computational effort, is shown using LeNet running on an Ampere Nvidia GPU as a case study. The method reduces the computational effort for the evaluation by several orders of magnitude with respect to plain gate- and RTL-level faults simulation
Dependability Assessment of NAND Flash-memory for Mission-critical Applications
It is a matter of fact that NAND flash memory devices are well established in consumer market. However, it is not true that the same architectures adopted in the consumer market are suitable for mission critical applications like space. In fact, USB flash drives, digital cameras, MP3 players are usually adopted to store "less significant" data which are not changing frequently (e.g., MP3s, pictures, etc.). Therefore, in spite of NAND flash's drawbacks, a modest complexity is usually needed in the logic of commercial flash drives. On the other hand, mission critical applications have different reliability requirements from commercial scenarios. Moreover, they are usually playing in a hostile environment (e.g., the space) which contributes to worsen all the issues. We aim at providing practical valuable guidelines, comparisons and tradeoffs among the huge number of dimensions of fault tolerant methodologies for NAND flash applied to critical environments. We hope that such guidelines will be useful for our ongoing research and for all the interested reader
Dependability Assessment of NAND Flash-memory for Mission-critical Applications
It is a matter of fact that NAND flash memory devices are well established in consumer market. However, it is not true that the same architectures adopted in the consumer market are suitable for mission critical applications like space. In fact, USB
flash drives, digital cameras, MP3 players are usually adopted to store "less significant" data which are not changing frequently (e.g., MP3s, pictures, etc.). Therefore, in spite
of NAND flash’s drawbacks, a modest complexity is usually needed in the logic of commercial flash drives. On the other hand, mission critical applications have different reliability requirements from commercial scenarios. Moreover, they are usually playing in a hostile environment (e.g., the space) which contributes to worsen all the issues.
We aim at providing practical valuable guidelines, comparisons and tradeoffs among the huge number of dimensions of fault tolerant methodologies for NAND flash applied to critical environments. We hope that such guidelines will be useful for our ongoing research and for all the interested readers