126 research outputs found
Innovative Techniques for Testing and Diagnosing SoCs
We rely upon the continued functioning of many electronic devices for our everyday welfare,
usually embedding integrated circuits that are becoming even cheaper and smaller
with improved features. Nowadays, microelectronics can integrate a working computer
with CPU, memories, and even GPUs on a single die, namely System-On-Chip (SoC).
SoCs are also employed on automotive safety-critical applications, but need to be tested
thoroughly to comply with reliability standards, in particular the ISO26262 functional
safety for road vehicles.
The goal of this PhD. thesis is to improve SoC reliability by proposing innovative
techniques for testing and diagnosing its internal modules: CPUs, memories, peripherals,
and GPUs. The proposed approaches in the sequence appearing in this thesis are described
as follows:
1. Embedded Memory Diagnosis: Memories are dense and complex circuits which
are susceptible to design and manufacturing errors. Hence, it is important to understand
the fault occurrence in the memory array. In practice, the logical and physical
array representation differs due to an optimized design which adds enhancements to
the device, namely scrambling. This part proposes an accurate memory diagnosis
by showing the efforts of a software tool able to analyze test results, unscramble
the memory array, map failing syndromes to cell locations, elaborate cumulative
analysis, and elaborate a final fault model hypothesis. Several SRAM memory failing
syndromes were analyzed as case studies gathered on an industrial automotive
32-bit SoC developed by STMicroelectronics. The tool displayed defects virtually,
and results were confirmed by real photos taken from a microscope.
2. Functional Test Pattern Generation: The key for a successful test is the pattern applied
to the device. They can be structural or functional; the former usually benefits
from embedded test modules targeting manufacturing errors and is only effective
before shipping the component to the client. The latter, on the other hand, can be
applied during mission minimally impacting on performance but is penalized due
to high generation time. However, functional test patterns may benefit for having
different goals in functional mission mode. Part III of this PhD thesis proposes
three different functional test pattern generation methods for CPU cores embedded
in SoCs, targeting different test purposes, described as follows:
a. Functional Stress Patterns: Are suitable for optimizing functional stress during
I
Operational-life Tests and Burn-in Screening for an optimal device reliability
characterization
b. Functional Power Hungry Patterns: Are suitable for determining functional
peak power for strictly limiting the power of structural patterns during manufacturing
tests, thus reducing premature device over-kill while delivering high test
coverage
c. Software-Based Self-Test Patterns: Combines the potentiality of structural patterns
with functional ones, allowing its execution periodically during mission.
In addition, an external hardware communicating with a devised SBST was proposed.
It helps increasing in 3% the fault coverage by testing critical Hardly
Functionally Testable Faults not covered by conventional SBST patterns.
An automatic functional test pattern generation exploiting an evolutionary algorithm
maximizing metrics related to stress, power, and fault coverage was employed
in the above-mentioned approaches to quickly generate the desired patterns. The
approaches were evaluated on two industrial cases developed by STMicroelectronics;
8051-based and a 32-bit Power Architecture SoCs. Results show that generation
time was reduced upto 75% in comparison to older methodologies while
increasing significantly the desired metrics.
3. Fault Injection in GPGPU: Fault injection mechanisms in semiconductor devices
are suitable for generating structural patterns, testing and activating mitigation techniques,
and validating robust hardware and software applications. GPGPUs are
known for fast parallel computation used in high performance computing and advanced
driver assistance where reliability is the key point. Moreover, GPGPU manufacturers
do not provide design description code due to content secrecy. Therefore,
commercial fault injectors using the GPGPU model is unfeasible, making radiation
tests the only resource available, but are costly. In the last part of this thesis, we
propose a software implemented fault injector able to inject bit-flip in memory elements
of a real GPGPU. It exploits a software debugger tool and combines the
C-CUDA grammar to wisely determine fault spots and apply bit-flip operations in
program variables. The goal is to validate robust parallel algorithms by studying
fault propagation or activating redundancy mechanisms they possibly embed. The
effectiveness of the tool was evaluated on two robust applications: redundant parallel
matrix multiplication and floating point Fast Fourier Transform
New techniques for functional testing of microprocessor based systems
Electronic devices may be affected by failures, for example due to physical defects. These defects may be introduced during the manufacturing process, as well as during the normal operating life of the device due to aging. How to detect all these defects is not a trivial task, especially in complex systems such as processor cores. Nevertheless, safety-critical applications do not tolerate failures, this is the reason why testing such devices is needed so to guarantee a correct behavior at any time. Moreover, testing is a key parameter for assessing the quality of a manufactured product.
Consolidated testing techniques are based on special Design for Testability (DfT) features added in the original design to facilitate test effectiveness. Design, integration, and usage of the available DfT for testing purposes are fully supported by commercial EDA tools, hence approaches based on DfT are the standard solutions adopted by silicon vendors for testing their devices.
Tests exploiting the available DfT such as scan-chains manipulate the internal state of the system, differently to the normal functional mode, passing through unreachable configurations. Alternative solutions that do not violate such functional mode are defined as functional tests.
In microprocessor based systems, functional testing techniques include software-based self-test (SBST), i.e., a piece of software (referred to as test program) which is uploaded in the system available memory and executed, with the purpose of exciting a specific part of the system and observing the effects of possible defects affecting it. SBST has been widely-studies by the research community for years, but its adoption by the industry is quite recent.
My research activities have been mainly focused on the industrial perspective of SBST. The problem of providing an effective development flow and guidelines for integrating SBST in the available operating systems have been tackled and results have been provided on microprocessor based systems for the automotive domain. Remarkably, new algorithms have been also introduced with respect to state-of-the-art approaches, which can be systematically implemented to enrich SBST suites of test programs for modern microprocessor based systems. The proposed development flow and algorithms are being currently employed in real electronic control units for automotive products.
Moreover, a special hardware infrastructure purposely embedded in modern devices for interconnecting the numerous on-board instruments has been interest of my research as well. This solution is known as reconfigurable scan networks (RSNs) and its practical adoption is growing fast as new standards have been created. Test and diagnosis methodologies have been proposed targeting specific RSN features, aimed at checking whether the reconfigurability of such networks has not been corrupted by defects and, in this case, at identifying the defective elements of the network. The contribution of my work in this field has also been included in the first suite of public-domain benchmark networks
Observability Driven Path Generation for Delay Test
This research describes an approach for path generation using an observability metric for delay test. K Longest Path Per Gate (KLPG) tests are generated for sequential circuits. A transition launched from a scan flip-flop (SFF) is captured into another SFF during at-speed clock cycles, that is, clock cycles at the rated design speed. The generated path is a ‘longest path’ suitable for delay test. The path generation algorithm then utilizes observability of the fan-out gates in the consecutive, lower-speed clock cycles, known as coda cycles, to generate paths ending at a SFF, to capture the transition from the at-speed cycles. For a given clocking scheme defined by the number of coda cycles, if the final flip-flop is not scan-enabled, the path generation algorithm attempts to generate a different path that ends at a SFF, located in a different branch of the circuit fan-out, indicated by lower observability. The paths generated over multiple cycles are sequentially justified using Boolean satisfiability. The observability metric optimizes the path generation in the coda cycles by always attempting to grow the path through the branch with the best observability and never generating a path that ends at a non-scan flip-flop.
The algorithm has been developed in C++. The experiments have been performed on an Intel Core i7 machine with 64GB RAM. Various ISCAS benchmark circuits have been used with various KLPG configurations for code evaluation. Multiple configurations have been used for the experiments. The combinations of the values of K [1, 2, 3, 4, 5] and number of coda cycles [1, 2, 3] have been used to characterize the implementation. A sublinear rise is run time has been observed with increasing K values. The total number of tested paths rise with K and falls with number of coda cycles, due to the increasing number of constraints on the path, particularly due to the fixed inputs
High Quality Compact Delay Test Generation
Delay testing is used to detect timing defects and ensure that a circuit meets its
timing specifications. The growing need for delay testing is a result of the advances in
deep submicron (DSM) semiconductor technology and the increase in clock frequency.
Small delay defects that previously were benign now produce delay faults, due to
reduced timing margins. This research focuses on the development of new test methods
for small delay defects, within the limits of affordable test generation cost and pattern
count.
First, a new dynamic compaction algorithm has been proposed to generate
compacted test sets for K longest paths per gate (KLPG) in combinational circuits or
scan-based sequential circuits. This algorithm uses a greedy approach to compact paths
with non-conflicting necessary assignments together during test generation. Second, to
make this dynamic compaction approach practical for industrial use, a recursive learning
algorithm has been implemented to identify more necessary assignments for each path,
so that the path-to-test-pattern matching using necessary assignments is more accurate.
Third, a realistic low cost fault coverage metric targeting both global and local delay
faults has been developed. The metric suggests the test strategy of generating a different
number of longest paths for each line in the circuit while maintaining high fault coverage.
The number of paths and type of test depends on the timing slack of the paths under this
metric. Experimental results for ISCAS89 benchmark circuits and three industry circuits
show that the pattern count of KLPG can be significantly reduced using the proposed
methods. The pattern count is comparable to that of transition fault test, while achieving
higher test quality. Finally, the proposed ATPG methodology has been applied to an
industrial quad-core microprocessor. FMAX testing has been done on many devices and
silicon data has shown the benefit of KLPG test
Automatic test pattern generation for asynchronous circuits
The testability of integrated circuits becomes worse with transistor dimensions reaching nanometer
scales. Testing, the process of ensuring that circuits are fabricated without defects, becomes
inevitably part of the design process; a technique called design for test (DFT). Asynchronous
circuits have a number of desirable properties making them suitable for the challenges posed
by modern technologies, but are severely limited by the unavailability of EDA tools for DFT
and automatic test-pattern generation (ATPG).
This thesis is motivated towards developing test generation methodologies for asynchronous
circuits. In total four methods were developed which are aimed at two different fault models:
stuck-at faults at the basic logic gate level and transistor-level faults. The methods were
evaluated using a set of benchmark circuits and compared favorably to previously published
work.
First, ABALLAST is a partial-scan DFT method adapting the well-known BALLAST technique
for asynchronous circuits where balanced structures are used to guide the selection of
the state-holding elements that will be scanned. The test inputs are automatically provided
by a novel test pattern generator, which uses time frame unrolling to deal with the remaining,
non-scanned sequential C-elements. The second method, called AGLOB, uses algorithms
from strongly-connected components in graph graph theory as a method for finding the optimal
position of breaking the loops in the asynchronous circuit and adding scan registers. The
corresponding ATPG method converts cyclic circuits into acyclic for which standard tools can
provide test patterns. These patterns are then automatically converted for use in the original
cyclic circuits. The third method, ASCP, employs a new cycle enumeration method to find the
loops present in a circuit. Enumerated cycles are then processed using an efficient set covering
heuristic to select the scan elements for the circuit to be tested.Applying these methods to
the benchmark circuits shows an improvement in fault coverage compared to previous work,
which, for some circuits, was substantial. As no single method consistently outperforms the
others in all benchmarks, they are all valuable as a designer’s suite of tools for testing. Moreover,
since they are all scan-based, they are compatible and thus can be simultaneously used in
different parts of a larger circuit.
In the final method, ATRANTE, the main motivation of developing ATPG is supplemented by
transistor level test generation. It is developed for asynchronous circuits designed using a State
Transition Graph (STG) as their specification. The transistor-level circuit faults are efficiently
mapped onto faults that modify the original STG. For each potential STG fault, the ATPG tool
provides a sequence of test vectors that expose the difference in behavior to the output ports.
The fault coverage obtained was 52-72 % higher than the coverage obtained using the gate
level tests. Overall, four different design for test (DFT) methods for automatic test pattern generation
(ATPG) for asynchronous circuits at both gate and transistor level were introduced in this thesis.
A circuit extraction method for representing the asynchronous circuits at a higher level of
abstraction was also implemented.
Developing new methods for the test generation of asynchronous circuits in this thesis facilitates
the test generation for asynchronous designs using the CAD tools available for testing the
synchronous designs. Lessons learned and the research questions raised due to this work will
impact the future work to probe the possibilities of developing robust CAD tools for testing the
future asynchronous designs
Methodology to accelerate diagnostic coverage assessment: MADC
Tese (doutorado) - Universidade Federal de Santa Catarina, Centro Tecnológico, Programa de Pós-Graduação em Engenharia Elétrica, Florianópolis, 2016.Os veÃculos da atualidade vêm integrando um número crescente de eletrônica embarcada, com o objetivo de permitir uma experiência mais segura aos motoristas. Logo, a garantia da segurança fÃsica é um requisito que precisa ser observada por completo durante o processo de desenvolvimento. O padrão ISO 26262 provê medidas para garantir que esses requisitos não sejam negligenciados. Injeção de falhas é fortemente recomendada quando da verificação do funcionamento dos mecanismos de segurança implementados, assim como sua capacidade de cobertura associada ao diagnóstico de falhas existentes. A análise exaustiva não é obrigatória, mas evidências de que o máximo esforço foi feito para acurar a cobertura de diagnóstico precisam ser apresentadas, principalmente durante a avalição dos nÃveis de segurança associados a arquitetura implementada em hardware. Estes nÃveis dão suporte à s alegações de que o projeto obedece à s métricas de segurança da integridade fÃsica exigida em aplicações automotivas. Os nÃveis de integridade variam de A à D, sendo este último o mais rigoroso. Essa Tese explora o estado-da-arte em soluções de verificação, e tem por objetivo construir uma metodologia que permita acelerar a verificação da cobertura de diagnóstico alcançado. Diferentemente de outras técnicas voltadas à aceleração de injeção de falhas, a metodologia proposta utiliza uma plataforma de hardware dedicada à verificação, com o intuito de maximizar o desempenho relativo a simulação de falhas. Muitos aspectos relativos a ISO 26262 são observados de forma que a presente contribuição possa ser apreciada no segmento automotivo. Por fim, uma arquitetura OpenRISC é utilizada para confirmar os resultados alcançados com essa solução proposta pertencente ao estado-da-arte.Abstract : Modern vehicles are integrating a growing number of electronics to provide a safer experience for the driver. Therefore, safety is a non-negotiable requirement that must be considered through the vehicle development process. The ISO 26262 standard provides guidance to ensure that such requirements are implemented. Fault injection is highly recommended for the functional verification of safety mechanisms or to evaluate their diagnostic coverage capability. An exhaustive analysis is not required, but evidence of best effort through the diagnostic coverage assessment needs to be provided when performing quantitative evaluation of hardware architectural metrics. These metrics support that the automotive safety integrity level ? ranging from A (lowest) to D (strictest) levels ? was obeyed. This thesis explores the most advanced verification solutions in order to build a methodology to accelerate the diagnostic coverage assessment. Different from similar techniques for fault injection acceleration, the proposed methodology does not require any modification of the design model to enable acceleration. Many functional safety requisites in the ISO 26262 are considered thus allowing the contribution presented to be a suitable solution for the automotive segment. An OpenRISC architecture is used to confirm the results achieved by this state-of-the-art solution
FPGA Based Design for Accelerated Fault-testing of Integrated Circuits
In the past few decades, integrated circuits have become a major part of everyday life. Every circuit that is created needs to be tested for faults so faulty circuits are not sent to end-users. The creation of these tests is time consuming, costly and difficult to perform on larger circuits. This research presents a novel method for fault detection and test pattern reduction in integrated circuitry under test. By leveraging the FPGA\u27s reconfigurability and parallel processing capabilities, a speed up in fault detection can be achieved over previous computer simulation techniques. This work presents the following contributions to the field of Stuck-At-Fault detection: We present a new method for inserting faults into a circuit net list. Given any circuit netlist, our tool can insert multiplexers into a circuit at correct internal nodes to aid in fault emulation on reconfigurable hardware. We present a parallel method of fault emulation. The benefit of the FPGA is not only its ability to implement any circuit, but its ability to process data in parallel. This research utilizes this to create a more efficient emulation method that implements numerous copies of the same circuit in the FPGA. A new method to organize the most efficient faults. Most methods for determinin the minimum number of inputs to cover the most faults require sophisticated softwareprograms that use heuristics. By utilizing hardware, this research is able to process data faster and use a simpler method for an efficient way of minimizing inputs
New Techniques for On-line Testing and Fault Mitigation in GPUs
L'abstract è presente nell'allegato / the abstract is in the attachmen
Resilience of an embedded architecture using hardware redundancy
In the last decade the dominance of the general computing systems market has being replaced by embedded systems with billions of units manufactured every year. Embedded systems appear in contexts where continuous operation is of utmost importance and failure can be profound.
Nowadays, radiation poses a serious threat to the reliable operation of safety-critical systems. Fault avoidance techniques, such as radiation hardening, have been commonly used in space applications. However, these components are expensive, lag behind commercial components with regards to performance and do not provide 100% fault elimination. Without fault tolerant mechanisms, many of these faults can become errors at the application or system level, which in turn, can result in catastrophic failures.
In this work we study the concepts of fault tolerance and dependability and
extend these concepts providing our own definition of resilience. We analyse the physics of radiation-induced faults, the damage mechanisms of particles and the process that leads to computing failures. We provide extensive taxonomies of 1) existing fault tolerant techniques and of 2) the effects of radiation in state-of-the-art electronics, analysing and comparing their characteristics. We propose a detailed model of faults and provide a classification of the different types of faults at various levels. We introduce an algorithm of fault tolerance and define the system states and actions necessary to implement it. We introduce novel hardware and system software techniques that provide a more efficient combination of reliability, performance and power consumption than existing techniques. We propose a new element of the system called syndrome that is the core of a resilient architecture whose software and hardware can adapt to reliable and unreliable environments. We implement a software simulator and disassembler and introduce a testing framework in combination with ERA’s assembler and commercial hardware simulators
Fault Tolerant Electronic System Design
Due to technology scaling, which means reduced transistor size, higher density, lower voltage and more aggressive clock frequency, VLSI devices may become more sensitive against soft errors. Especially for those devices used in safety- and mission-critical applications, dependability and reliability are becoming increasingly important constraints during the development of system on/around them. Other phenomena (e.g., aging and wear-out effects) also have negative impacts on reliability of modern circuits. Recent researches show that even at sea level, radiation particles can still induce soft errors in electronic systems.
On one hand, processor-based system are commonly used in a wide variety of applications, including safety-critical and high availability missions, e.g., in the automotive, biomedical and aerospace domains. In these fields, an error may produce catastrophic consequences. Thus, dependability is a primary target that must be achieved taking into account tight constraints in terms of cost, performance, power and time to market. With standards and regulations (e.g., ISO-26262, DO-254, IEC-61508) clearly specify the targets to be achieved and the methods to prove their achievement, techniques working at system level are particularly attracting.
On the other hand, Field Programmable Gate Array (FPGA) devices are becoming more and more attractive, also in safety- and mission-critical applications due to the high performance, low power consumption and the flexibility for reconfiguration they provide. Two types of FPGAs are commonly used, based on their configuration memory cell technology, i.e., SRAM-based and Flash-based FPGA. For SRAM-based FPGAs, the SRAM cells of the configuration memory highly susceptible to radiation induced effects which can leads to system failure; and for Flash-based FPGAs, even though their non-volatile configuration memory cells are almost immune to Single Event Upsets induced by energetic particles, the floating gate switches and the logic cells in the configuration tiles can still suffer from Single Event Effects when hit by an highly charged particle. So analysis and mitigation techniques for Single Event Effects on FPGAs are becoming increasingly important in the design flow especially when reliability is one of the main requirements
- …