36 research outputs found

    An efficient fault syndromes simulator for SRAM memories

    Get PDF
    Testing and diagnosis techniques play a key role in the advance of semiconductor memory technology. The challenge of failure detection has created intensive investigation on efficient testing and diagnosis algorithm for better fault coverage and diagnostic resolution. At present, March test algorithm is used to detect and diagnose all faults related to Random Access Memories. However, the test and diagnosis process are mainly done manually. Due to this, a systematic approach for developing and evaluating memory test algorithm is required. This work is focused on incorporating the March based test algorithm using a software simulator tool for implementing a fast and systematic memory testing algorithm. The simulator allows a user through a GUI to select a March based test algorithm depending on the desired fault coverage and diagnostic resolution. Experimental results show that using the simulator for testing is more efficient than that of the traditional testing algorithm. This new simulator makes it possible for a detailed list of stuck-at faults, transition faults and coupling faults covered by each algorithm and its percentage to be displayed after a set of test algorithms has been chosen. The percentage of diagnostic resolution is also displayed. This proves that the simulator reduces the trade-off between test time, fault coverage and diagnostic resolution. Moreover, the chosen algorithm can be applied to incorporate with memory built-in self-test and diagnosis, to have a better fault coverage and diagnostic resolution. Universities and industry involved in memory Built- in-Self test, Built-in-Self repair and Built-in-Self diagnose will benefit by saving a few years on researching an efficient algorithm to be implemented in their designs

    A fault syndromes simulator for random access memories

    Get PDF
    Testing and diagnosis techniques play a key role in the advance of semiconductor memory technology. The challenge of failure detection has created intensive investigation on efficient testing and diagnosis algorithm for better fault coverage and diagnostic resolution. At present, the March test algorithm is used to detect and diagnose all faults related to Random Access Memories. This algorithm also allows the faults to be located and identified. However, the test and diagnosis process is mainly done manually. Due to this, a systematic approach for developing and evaluating memory test algorithm is required. This work is focused on incorporating the March based test algorithm using a software simulator tool for implementing a fast and systematic memory testing algorithm. The simulator allows a user through a GUI to select a March based test algorithm depending on the desired fault coverage and diagnostic resolution. Experimental results show that using the simulator for testing is more efficient than that of the traditional testing algorithm. This new simulator makes it possible for a detailed list of coupling faults and stuck-at faults covered by each algorithm and its percentage to be displayed after a set of test algorithms has been chosen. The percentage of diagnostic resolution is also displayed. This proves that the simulator reduces the trade-off between test time, fault coverage and diagnostic resolution. Moreover, the chosen algorithm can be applied to incorporate with memory built-in self-test and diagnosis, to have a better fault coverage and diagnostic resolution. Universities and industry involved in memory Built-in-Self test, Built-in-Self repair and Built-in-Self diagnose will benefit by saving a few years on finding an efficient algorithm to be implemented in their designs

    Improving Error Correction Codes for Multiple-Cell Upsets in Space Applications

    Full text link
    © 2018 IEEE. Personal use of this material is permitted. Permissíon from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertisíng or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.[EN] Currently, faults suffered by SRAM memory systems have increased due to the aggressive CMOS integration density. Thus, the probability of occurrence of single-cell upsets (SCUs) or multiple-cell upsets (MCUs) augments. One of the main causes of MCUs in space applications is cosmic radiation. A common solution is the use of error correction codes (ECCs). Nevertheless, when using ECCs in space applications, they must achieve a good balance between error coverage and redundancy, and their encoding/decoding circuits must be efficient in terms of area, power, and delay. Different codes have been proposed to tolerate MCUs. For instance, Matrix codes use Hamming codes and parity checks in a bi-dimensional layout to correct and detect some patterns of MCUs. Recently presented, column¿line¿code (CLC) has been designed to tolerate MCUs in space applications. CLC is a modified Matrix code, based on extended Hamming codes and parity checks. Nevertheless, a common property of these codes is the high redundancy introduced. In this paper, we present a series of new lowredundant ECCs able to correct MCUs with reduced area, power, and delay overheads. Also, these new codes maintain, or even improve, memory error coverage with respect to Matrix and CLC codes.This work was supported by the Spanish Government under the research Project TIN2016-81075-R.Gracia-Morán, J.; Saiz-Adalid, L.; Gil Tomás, DA.; Gil, P. (2018). Improving Error Correction Codes for Multiple-Cell Upsets in Space Applications. IEEE Transactions on Very Large Scale Integration (VLSI) Systems. 26(10):2132-2142. https://doi.org/10.1109/TVLSI.2018.2837220S21322142261

    Single event upset hardened embedded domain specific reconfigurable architecture

    Get PDF

    Fault Detection with Optimum March Test Algorithm

    Get PDF
    This paper presents a research work aimed to detect previously-undetected faults, either Write Disturb Faults (WDFs) or Deceptive Read Destructive Faults (DRDFs) or both in March Algorithm such as MATS++(6N), March C-(10N), March SR(14N), and March CL(12N). The main focus of this research is to improve fault coverage on Single Cell Faults as well as Static Double Cell Faults detection, using specified test algorithm. Transition Coupling Faults (CFtrs), Write Destructive Coupling Faults (CFwds) and Deceptive Read Destructive Faults (CFdrds) are types of faults mainly used in this research. The experiment result published in [1] shows BIST (Built-In-Self-Test) implementation with the new algorithm. It provides the same test length but with bigger area overhead, we therefore proposed a new 14N March Test Algorithm with fault coverage of more than 95% using solid 0s and 1s Data Background (DB). This paper reveals the design methodology to generate DB covers all memories function by applying non-transition data, transition data, and single read and double read data. The automation hardware was designed to give the flexibility to the user to generate other new March Algorithm prior to the selected algorithm and analyzed the performance in terms of fault detection and power consumption

    Innovative Techniques for Testing and Diagnosing SoCs

    Get PDF
    We rely upon the continued functioning of many electronic devices for our everyday welfare, usually embedding integrated circuits that are becoming even cheaper and smaller with improved features. Nowadays, microelectronics can integrate a working computer with CPU, memories, and even GPUs on a single die, namely System-On-Chip (SoC). SoCs are also employed on automotive safety-critical applications, but need to be tested thoroughly to comply with reliability standards, in particular the ISO26262 functional safety for road vehicles. The goal of this PhD. thesis is to improve SoC reliability by proposing innovative techniques for testing and diagnosing its internal modules: CPUs, memories, peripherals, and GPUs. The proposed approaches in the sequence appearing in this thesis are described as follows: 1. Embedded Memory Diagnosis: Memories are dense and complex circuits which are susceptible to design and manufacturing errors. Hence, it is important to understand the fault occurrence in the memory array. In practice, the logical and physical array representation differs due to an optimized design which adds enhancements to the device, namely scrambling. This part proposes an accurate memory diagnosis by showing the efforts of a software tool able to analyze test results, unscramble the memory array, map failing syndromes to cell locations, elaborate cumulative analysis, and elaborate a final fault model hypothesis. Several SRAM memory failing syndromes were analyzed as case studies gathered on an industrial automotive 32-bit SoC developed by STMicroelectronics. The tool displayed defects virtually, and results were confirmed by real photos taken from a microscope. 2. Functional Test Pattern Generation: The key for a successful test is the pattern applied to the device. They can be structural or functional; the former usually benefits from embedded test modules targeting manufacturing errors and is only effective before shipping the component to the client. The latter, on the other hand, can be applied during mission minimally impacting on performance but is penalized due to high generation time. However, functional test patterns may benefit for having different goals in functional mission mode. Part III of this PhD thesis proposes three different functional test pattern generation methods for CPU cores embedded in SoCs, targeting different test purposes, described as follows: a. Functional Stress Patterns: Are suitable for optimizing functional stress during I Operational-life Tests and Burn-in Screening for an optimal device reliability characterization b. Functional Power Hungry Patterns: Are suitable for determining functional peak power for strictly limiting the power of structural patterns during manufacturing tests, thus reducing premature device over-kill while delivering high test coverage c. Software-Based Self-Test Patterns: Combines the potentiality of structural patterns with functional ones, allowing its execution periodically during mission. In addition, an external hardware communicating with a devised SBST was proposed. It helps increasing in 3% the fault coverage by testing critical Hardly Functionally Testable Faults not covered by conventional SBST patterns. An automatic functional test pattern generation exploiting an evolutionary algorithm maximizing metrics related to stress, power, and fault coverage was employed in the above-mentioned approaches to quickly generate the desired patterns. The approaches were evaluated on two industrial cases developed by STMicroelectronics; 8051-based and a 32-bit Power Architecture SoCs. Results show that generation time was reduced upto 75% in comparison to older methodologies while increasing significantly the desired metrics. 3. Fault Injection in GPGPU: Fault injection mechanisms in semiconductor devices are suitable for generating structural patterns, testing and activating mitigation techniques, and validating robust hardware and software applications. GPGPUs are known for fast parallel computation used in high performance computing and advanced driver assistance where reliability is the key point. Moreover, GPGPU manufacturers do not provide design description code due to content secrecy. Therefore, commercial fault injectors using the GPGPU model is unfeasible, making radiation tests the only resource available, but are costly. In the last part of this thesis, we propose a software implemented fault injector able to inject bit-flip in memory elements of a real GPGPU. It exploits a software debugger tool and combines the C-CUDA grammar to wisely determine fault spots and apply bit-flip operations in program variables. The goal is to validate robust parallel algorithms by studying fault propagation or activating redundancy mechanisms they possibly embed. The effectiveness of the tool was evaluated on two robust applications: redundant parallel matrix multiplication and floating point Fast Fourier Transform

    A Multi-level Approach to Evaluate the Impact of GPU Permanent Faults on CNN's Reliability

    Get PDF
    Graphics processing units (GPUs) are widely used to accelerate Artificial Intelligence applications, such as those based on Convolutional Neural Networks (CNNs). Since in some domains in which CNNs are heavily employed (e.g., automotive and robotics) the expected lifetime of GPUs is over ten years, it is of paramount importance to study the impact of permanent faults (e.g. due to aging). Crucially, while the impact of transient faults on GPUs running CNNs has been widely studied, an accurate evaluation of the impact of permanent faults is still lacking. Performing this evaluation is challenging due to the complexity of GPU devices and the software implementing a CNN. In this work, we propose a methodology that combines the accuracy of gate-level fault simulation with the speed and flexibility of software fault injection to evaluate the effects of permanent hardware faults affecting a GPU. First, we profile the executed low-level GPU instructions during the CNN inference. Then, using extensive gate-level fault injection campaigns, we provide an accurate analysis of the effects of permanent faults on the internal modules executing the targeted instructions. Finally, we propagate these effects using fast software-based fault injection. The method allows, for the first time, to estimate the percentage of permanent faults leading the CNN to produce wrong results (i.e., changing the result of its work). The method's feasibility, which allows for flexibly trade-off accuracy with the required computational effort, is shown using LeNet running on an Ampere Nvidia GPU as a case study. The method reduces the computational effort for the evaluation by several orders of magnitude with respect to plain gate- and RTL-level faults simulation

    Dependability Assessment of NAND Flash-memory for Mission-critical Applications

    Get PDF
    It is a matter of fact that NAND flash memory devices are well established in consumer market. However, it is not true that the same architectures adopted in the consumer market are suitable for mission critical applications like space. In fact, USB flash drives, digital cameras, MP3 players are usually adopted to store "less significant" data which are not changing frequently (e.g., MP3s, pictures, etc.). Therefore, in spite of NAND flash's drawbacks, a modest complexity is usually needed in the logic of commercial flash drives. On the other hand, mission critical applications have different reliability requirements from commercial scenarios. Moreover, they are usually playing in a hostile environment (e.g., the space) which contributes to worsen all the issues. We aim at providing practical valuable guidelines, comparisons and tradeoffs among the huge number of dimensions of fault tolerant methodologies for NAND flash applied to critical environments. We hope that such guidelines will be useful for our ongoing research and for all the interested reader

    Dependability Assessment of NAND Flash-memory for Mission-critical Applications

    Get PDF
    It is a matter of fact that NAND flash memory devices are well established in consumer market. However, it is not true that the same architectures adopted in the consumer market are suitable for mission critical applications like space. In fact, USB flash drives, digital cameras, MP3 players are usually adopted to store "less significant" data which are not changing frequently (e.g., MP3s, pictures, etc.). Therefore, in spite of NAND flash’s drawbacks, a modest complexity is usually needed in the logic of commercial flash drives. On the other hand, mission critical applications have different reliability requirements from commercial scenarios. Moreover, they are usually playing in a hostile environment (e.g., the space) which contributes to worsen all the issues. We aim at providing practical valuable guidelines, comparisons and tradeoffs among the huge number of dimensions of fault tolerant methodologies for NAND flash applied to critical environments. We hope that such guidelines will be useful for our ongoing research and for all the interested readers
    corecore