1 research outputs found
Innovative Techniques for Testing and Diagnosing SoCs
We rely upon the continued functioning of many electronic devices for our everyday welfare,
usually embedding integrated circuits that are becoming even cheaper and smaller
with improved features. Nowadays, microelectronics can integrate a working computer
with CPU, memories, and even GPUs on a single die, namely System-On-Chip (SoC).
SoCs are also employed on automotive safety-critical applications, but need to be tested
thoroughly to comply with reliability standards, in particular the ISO26262 functional
safety for road vehicles.
The goal of this PhD. thesis is to improve SoC reliability by proposing innovative
techniques for testing and diagnosing its internal modules: CPUs, memories, peripherals,
and GPUs. The proposed approaches in the sequence appearing in this thesis are described
as follows:
1. Embedded Memory Diagnosis: Memories are dense and complex circuits which
are susceptible to design and manufacturing errors. Hence, it is important to understand
the fault occurrence in the memory array. In practice, the logical and physical
array representation differs due to an optimized design which adds enhancements to
the device, namely scrambling. This part proposes an accurate memory diagnosis
by showing the efforts of a software tool able to analyze test results, unscramble
the memory array, map failing syndromes to cell locations, elaborate cumulative
analysis, and elaborate a final fault model hypothesis. Several SRAM memory failing
syndromes were analyzed as case studies gathered on an industrial automotive
32-bit SoC developed by STMicroelectronics. The tool displayed defects virtually,
and results were confirmed by real photos taken from a microscope.
2. Functional Test Pattern Generation: The key for a successful test is the pattern applied
to the device. They can be structural or functional; the former usually benefits
from embedded test modules targeting manufacturing errors and is only effective
before shipping the component to the client. The latter, on the other hand, can be
applied during mission minimally impacting on performance but is penalized due
to high generation time. However, functional test patterns may benefit for having
different goals in functional mission mode. Part III of this PhD thesis proposes
three different functional test pattern generation methods for CPU cores embedded
in SoCs, targeting different test purposes, described as follows:
a. Functional Stress Patterns: Are suitable for optimizing functional stress during
I
Operational-life Tests and Burn-in Screening for an optimal device reliability
characterization
b. Functional Power Hungry Patterns: Are suitable for determining functional
peak power for strictly limiting the power of structural patterns during manufacturing
tests, thus reducing premature device over-kill while delivering high test
coverage
c. Software-Based Self-Test Patterns: Combines the potentiality of structural patterns
with functional ones, allowing its execution periodically during mission.
In addition, an external hardware communicating with a devised SBST was proposed.
It helps increasing in 3% the fault coverage by testing critical Hardly
Functionally Testable Faults not covered by conventional SBST patterns.
An automatic functional test pattern generation exploiting an evolutionary algorithm
maximizing metrics related to stress, power, and fault coverage was employed
in the above-mentioned approaches to quickly generate the desired patterns. The
approaches were evaluated on two industrial cases developed by STMicroelectronics;
8051-based and a 32-bit Power Architecture SoCs. Results show that generation
time was reduced upto 75% in comparison to older methodologies while
increasing significantly the desired metrics.
3. Fault Injection in GPGPU: Fault injection mechanisms in semiconductor devices
are suitable for generating structural patterns, testing and activating mitigation techniques,
and validating robust hardware and software applications. GPGPUs are
known for fast parallel computation used in high performance computing and advanced
driver assistance where reliability is the key point. Moreover, GPGPU manufacturers
do not provide design description code due to content secrecy. Therefore,
commercial fault injectors using the GPGPU model is unfeasible, making radiation
tests the only resource available, but are costly. In the last part of this thesis, we
propose a software implemented fault injector able to inject bit-flip in memory elements
of a real GPGPU. It exploits a software debugger tool and combines the
C-CUDA grammar to wisely determine fault spots and apply bit-flip operations in
program variables. The goal is to validate robust parallel algorithms by studying
fault propagation or activating redundancy mechanisms they possibly embed. The
effectiveness of the tool was evaluated on two robust applications: redundant parallel
matrix multiplication and floating point Fast Fourier Transform