813 research outputs found
Memory built-in self-repair and correction for improving yield: a review
Nanometer memories are highly prone to defects due to dense structure, necessitating memory built-in self-repair as a must-have feature to improve yield. Today’s system-on-chips contain memories occupying an area as high as 90% of the chip area. Shrinking technology uses stricter design rules for memories, making them more prone to manufacturing defects. Further, using 3D-stacked memories makes the system vulnerable to newer defects such as those coming from through-silicon-vias (TSV) and micro bumps. The increased memory size is also resulting in an increase in soft errors during system operation. Multiple memory repair techniques based on redundancy and correction codes have been presented to recover from such defects and prevent system failures. This paper reviews recently published memory repair methodologies, including various built-in self-repair (BISR) architectures, repair analysis algorithms, in-system repair, and soft repair handling using error correcting codes (ECC). It provides a classification of these techniques based on method and usage. Finally, it reviews evaluation methods used to determine the effectiveness of the repair algorithms. The paper aims to present a survey of these methodologies and prepare a platform for developing repair methods for upcoming-generation memories
Innovative Techniques for Testing and Diagnosing SoCs
We rely upon the continued functioning of many electronic devices for our everyday welfare,
usually embedding integrated circuits that are becoming even cheaper and smaller
with improved features. Nowadays, microelectronics can integrate a working computer
with CPU, memories, and even GPUs on a single die, namely System-On-Chip (SoC).
SoCs are also employed on automotive safety-critical applications, but need to be tested
thoroughly to comply with reliability standards, in particular the ISO26262 functional
safety for road vehicles.
The goal of this PhD. thesis is to improve SoC reliability by proposing innovative
techniques for testing and diagnosing its internal modules: CPUs, memories, peripherals,
and GPUs. The proposed approaches in the sequence appearing in this thesis are described
as follows:
1. Embedded Memory Diagnosis: Memories are dense and complex circuits which
are susceptible to design and manufacturing errors. Hence, it is important to understand
the fault occurrence in the memory array. In practice, the logical and physical
array representation differs due to an optimized design which adds enhancements to
the device, namely scrambling. This part proposes an accurate memory diagnosis
by showing the efforts of a software tool able to analyze test results, unscramble
the memory array, map failing syndromes to cell locations, elaborate cumulative
analysis, and elaborate a final fault model hypothesis. Several SRAM memory failing
syndromes were analyzed as case studies gathered on an industrial automotive
32-bit SoC developed by STMicroelectronics. The tool displayed defects virtually,
and results were confirmed by real photos taken from a microscope.
2. Functional Test Pattern Generation: The key for a successful test is the pattern applied
to the device. They can be structural or functional; the former usually benefits
from embedded test modules targeting manufacturing errors and is only effective
before shipping the component to the client. The latter, on the other hand, can be
applied during mission minimally impacting on performance but is penalized due
to high generation time. However, functional test patterns may benefit for having
different goals in functional mission mode. Part III of this PhD thesis proposes
three different functional test pattern generation methods for CPU cores embedded
in SoCs, targeting different test purposes, described as follows:
a. Functional Stress Patterns: Are suitable for optimizing functional stress during
I
Operational-life Tests and Burn-in Screening for an optimal device reliability
characterization
b. Functional Power Hungry Patterns: Are suitable for determining functional
peak power for strictly limiting the power of structural patterns during manufacturing
tests, thus reducing premature device over-kill while delivering high test
coverage
c. Software-Based Self-Test Patterns: Combines the potentiality of structural patterns
with functional ones, allowing its execution periodically during mission.
In addition, an external hardware communicating with a devised SBST was proposed.
It helps increasing in 3% the fault coverage by testing critical Hardly
Functionally Testable Faults not covered by conventional SBST patterns.
An automatic functional test pattern generation exploiting an evolutionary algorithm
maximizing metrics related to stress, power, and fault coverage was employed
in the above-mentioned approaches to quickly generate the desired patterns. The
approaches were evaluated on two industrial cases developed by STMicroelectronics;
8051-based and a 32-bit Power Architecture SoCs. Results show that generation
time was reduced upto 75% in comparison to older methodologies while
increasing significantly the desired metrics.
3. Fault Injection in GPGPU: Fault injection mechanisms in semiconductor devices
are suitable for generating structural patterns, testing and activating mitigation techniques,
and validating robust hardware and software applications. GPGPUs are
known for fast parallel computation used in high performance computing and advanced
driver assistance where reliability is the key point. Moreover, GPGPU manufacturers
do not provide design description code due to content secrecy. Therefore,
commercial fault injectors using the GPGPU model is unfeasible, making radiation
tests the only resource available, but are costly. In the last part of this thesis, we
propose a software implemented fault injector able to inject bit-flip in memory elements
of a real GPGPU. It exploits a software debugger tool and combines the
C-CUDA grammar to wisely determine fault spots and apply bit-flip operations in
program variables. The goal is to validate robust parallel algorithms by studying
fault propagation or activating redundancy mechanisms they possibly embed. The
effectiveness of the tool was evaluated on two robust applications: redundant parallel
matrix multiplication and floating point Fast Fourier Transform
Multi-core devices for safety-critical systems: a survey
Multi-core devices are envisioned to support the development of next-generation safety-critical systems, enabling the on-chip integration of functions of different criticality. This integration provides multiple system-level potential benefits such as cost, size, power, and weight reduction. However, safety certification becomes a challenge and several fundamental safety technical requirements must be addressed, such as temporal and spatial independence, reliability, and diagnostic coverage. This survey provides a categorization and overview at different device abstraction levels (nanoscale, component, and device) of selected key research contributions that support the compliance with these fundamental safety requirements.This work has been partially supported by the Spanish Ministry of Economy and Competitiveness under grant TIN2015-65316-P, Basque Government under grant KK-2019-00035 and the HiPEAC Network of Excellence. The Spanish Ministry of Economy and Competitiveness has also partially supported Jaume Abella under Ramon y Cajal postdoctoral fellowship (RYC-2013-14717).Peer ReviewedPostprint (author's final draft
Recommended from our members
Prognostics and health management of light emitting diodes
Prognostics is an engineering process of diagnosing, predicting the remaining useful life and estimating the reliability of systems and products. Prognostics and Health Management (PHM) has emerged in the last decade as one of the most efficient approaches in failure prevention, reliability estimation and remaining useful life predictions of various engineering systems and products. Light Emitting Diodes (LEDs) are optoelectronic micro-devices that are now replacing traditional incandescent and fluorescent lighting, as they have many advantages including higher reliability, greater energy efficiency, long life time and faster switching speed. Even though LEDs have high reliability and long life time, manufacturers and lighting systems designers still need to assess the reliability of LED lighting systems and the failures in the LED.
This research provides both experimental and theoretical results that demonstrate the use of prognostics and health monitoring techniques for high power LEDs subjected to harsh operating conditions. Data driven, model driven and fusion prognostics approaches are developed to monitor and identify LED failures, based on the requirement for the light output power. The approaches adopted in this work are validated and can be used to assess the life of an LED lighting system after their deployment based on the power of the light output emitted. The data driven techniques are only based on monitoring selected operational and performance indicators using sensors whereas the model driven technique is based on sensor data as well as on a developed empirical model. Fusion approach is also developed using the data driven and the model driven approaches to the LED. Real-time implementation of developed approaches are also investigated and discussed
Computationally efficient, real-time, and embeddable prognostic techniques for power electronics
Power electronics are increasingly important in new generation vehicles as critical safety mechanical subsystems are being replaced with more electronic components. Hence, it is vital that the health of these power electronic components is monitored for safety and reliability on a platform. The aim of this paper is to develop a prognostic approach for predicting the remaining useful life of power electronic components. The developed algorithms must also be embeddable and computationally efficient to support on-board real-time decision making. Current state-of-the-art prognostic algorithms, notably those based on Markov models, are computationally intensive and not applicable to real-time embedded applications. In this paper, an isolated-gate bipolar transistor (IGBT) is used as a case study for prognostic development. The proposed approach is developed by analyzing failure mechanisms and statistics of IGBT degradation data obtained from an accelerated aging experiment. The approach explores various probability distributions for modeling discrete degradation profiles of the IGBT component. This allows the stochastic degradation model to be efficiently simulated, in this particular example ~1000 times more efficiently than Markov approaches
Ensuring a High Quality Digital Device through Design for Testability
An electronic device is reliable if it is available for use most of the times throughout its life. The reliability can be affected by mishandling and use under abnormal operating conditions. High quality product cannot be achieved without proper verification and testing during the product development cycle. If the design is difficult to test, then it is very likely that most of the faults will not be detected before it is shipped to the customer. This paper describes how product quality can be improved by making the hardware design testable. Various designs for testability techniqueswere discussed. A three bit counter circuit was used to illustrate the benefits of design for testability by using scan chain methodology
- …