128 research outputs found
New techniques for functional testing of microprocessor based systems
Electronic devices may be affected by failures, for example due to physical defects. These defects may be introduced during the manufacturing process, as well as during the normal operating life of the device due to aging. How to detect all these defects is not a trivial task, especially in complex systems such as processor cores. Nevertheless, safety-critical applications do not tolerate failures, this is the reason why testing such devices is needed so to guarantee a correct behavior at any time. Moreover, testing is a key parameter for assessing the quality of a manufactured product.
Consolidated testing techniques are based on special Design for Testability (DfT) features added in the original design to facilitate test effectiveness. Design, integration, and usage of the available DfT for testing purposes are fully supported by commercial EDA tools, hence approaches based on DfT are the standard solutions adopted by silicon vendors for testing their devices.
Tests exploiting the available DfT such as scan-chains manipulate the internal state of the system, differently to the normal functional mode, passing through unreachable configurations. Alternative solutions that do not violate such functional mode are defined as functional tests.
In microprocessor based systems, functional testing techniques include software-based self-test (SBST), i.e., a piece of software (referred to as test program) which is uploaded in the system available memory and executed, with the purpose of exciting a specific part of the system and observing the effects of possible defects affecting it. SBST has been widely-studies by the research community for years, but its adoption by the industry is quite recent.
My research activities have been mainly focused on the industrial perspective of SBST. The problem of providing an effective development flow and guidelines for integrating SBST in the available operating systems have been tackled and results have been provided on microprocessor based systems for the automotive domain. Remarkably, new algorithms have been also introduced with respect to state-of-the-art approaches, which can be systematically implemented to enrich SBST suites of test programs for modern microprocessor based systems. The proposed development flow and algorithms are being currently employed in real electronic control units for automotive products.
Moreover, a special hardware infrastructure purposely embedded in modern devices for interconnecting the numerous on-board instruments has been interest of my research as well. This solution is known as reconfigurable scan networks (RSNs) and its practical adoption is growing fast as new standards have been created. Test and diagnosis methodologies have been proposed targeting specific RSN features, aimed at checking whether the reconfigurability of such networks has not been corrupted by defects and, in this case, at identifying the defective elements of the network. The contribution of my work in this field has also been included in the first suite of public-domain benchmark networks
Innovative Techniques for Testing and Diagnosing SoCs
We rely upon the continued functioning of many electronic devices for our everyday welfare,
usually embedding integrated circuits that are becoming even cheaper and smaller
with improved features. Nowadays, microelectronics can integrate a working computer
with CPU, memories, and even GPUs on a single die, namely System-On-Chip (SoC).
SoCs are also employed on automotive safety-critical applications, but need to be tested
thoroughly to comply with reliability standards, in particular the ISO26262 functional
safety for road vehicles.
The goal of this PhD. thesis is to improve SoC reliability by proposing innovative
techniques for testing and diagnosing its internal modules: CPUs, memories, peripherals,
and GPUs. The proposed approaches in the sequence appearing in this thesis are described
as follows:
1. Embedded Memory Diagnosis: Memories are dense and complex circuits which
are susceptible to design and manufacturing errors. Hence, it is important to understand
the fault occurrence in the memory array. In practice, the logical and physical
array representation differs due to an optimized design which adds enhancements to
the device, namely scrambling. This part proposes an accurate memory diagnosis
by showing the efforts of a software tool able to analyze test results, unscramble
the memory array, map failing syndromes to cell locations, elaborate cumulative
analysis, and elaborate a final fault model hypothesis. Several SRAM memory failing
syndromes were analyzed as case studies gathered on an industrial automotive
32-bit SoC developed by STMicroelectronics. The tool displayed defects virtually,
and results were confirmed by real photos taken from a microscope.
2. Functional Test Pattern Generation: The key for a successful test is the pattern applied
to the device. They can be structural or functional; the former usually benefits
from embedded test modules targeting manufacturing errors and is only effective
before shipping the component to the client. The latter, on the other hand, can be
applied during mission minimally impacting on performance but is penalized due
to high generation time. However, functional test patterns may benefit for having
different goals in functional mission mode. Part III of this PhD thesis proposes
three different functional test pattern generation methods for CPU cores embedded
in SoCs, targeting different test purposes, described as follows:
a. Functional Stress Patterns: Are suitable for optimizing functional stress during
I
Operational-life Tests and Burn-in Screening for an optimal device reliability
characterization
b. Functional Power Hungry Patterns: Are suitable for determining functional
peak power for strictly limiting the power of structural patterns during manufacturing
tests, thus reducing premature device over-kill while delivering high test
coverage
c. Software-Based Self-Test Patterns: Combines the potentiality of structural patterns
with functional ones, allowing its execution periodically during mission.
In addition, an external hardware communicating with a devised SBST was proposed.
It helps increasing in 3% the fault coverage by testing critical Hardly
Functionally Testable Faults not covered by conventional SBST patterns.
An automatic functional test pattern generation exploiting an evolutionary algorithm
maximizing metrics related to stress, power, and fault coverage was employed
in the above-mentioned approaches to quickly generate the desired patterns. The
approaches were evaluated on two industrial cases developed by STMicroelectronics;
8051-based and a 32-bit Power Architecture SoCs. Results show that generation
time was reduced upto 75% in comparison to older methodologies while
increasing significantly the desired metrics.
3. Fault Injection in GPGPU: Fault injection mechanisms in semiconductor devices
are suitable for generating structural patterns, testing and activating mitigation techniques,
and validating robust hardware and software applications. GPGPUs are
known for fast parallel computation used in high performance computing and advanced
driver assistance where reliability is the key point. Moreover, GPGPU manufacturers
do not provide design description code due to content secrecy. Therefore,
commercial fault injectors using the GPGPU model is unfeasible, making radiation
tests the only resource available, but are costly. In the last part of this thesis, we
propose a software implemented fault injector able to inject bit-flip in memory elements
of a real GPGPU. It exploits a software debugger tool and combines the
C-CUDA grammar to wisely determine fault spots and apply bit-flip operations in
program variables. The goal is to validate robust parallel algorithms by studying
fault propagation or activating redundancy mechanisms they possibly embed. The
effectiveness of the tool was evaluated on two robust applications: redundant parallel
matrix multiplication and floating point Fast Fourier Transform
Fault-Independent Test-Generation for Software-Based Self-Testing
Software-based self-test (SBST) is being widely used
in both manufacturing and in-the-field testing of processor-based
devices and Systems-on-Chips. Unfortunately, the stuck-at fault
model is increasingly inadequate to match the new and different
types of defects in the most recent semiconductor technologies,
while the explicit and separate targeting of every fault model
in SBST is cumbersome due to the high complexity of the
test-generation process, the lack of automation tools, and the
high CPU-intensity of the fault-simulation process. Moreover,
defects in advanced semiconductor technologies are not always
covered by the most commonly used fault-models, and the
probability of defect-escapes increases even more. To overcome
these shortcomings we propose the first fault-independent method
for generating software-based self-test procedures. The proposed
method is almost fully automated, it offers high coverage of non-
modeled faults by means of a novel SBST-oriented probabilistic
metric, and it is very fast as it omits the time-consuming test-
generation/fault-simulation processes. Extensive experiments on
the OpenRISC OR1200 processor show the advantages of the
proposed method
New Techniques to Reduce the Execution Time of Functional Test Programs
The compaction of test programs for processor-based systems is of utmost practical importance: Software-Based Self-Test (SBST) is nowadays increasingly adopted, especially for in-field test of safety-critical applications, and both the size and the execution time of the test are critical parameters. However, while compacting the size of binary test sequences has been thoroughly studied over the years, the reduction of the execution time of test programs is still a rather unexplored area of research. This paper describes a family of algorithms able to automatically enhance an existing test program, reducing the time required to run it and, as a side effect, its size. The proposed solutions are based on instruction removal and restoration, which is shown to be computationally more efficient than instruction removal alone. Experimental results demonstrate the compaction capabilities, and allow analyzing computational costs and effectiveness of the different algorithms
Fault-Independent Test-Generation for Software-Based Self-Testing
Software-based self-test (SBST) is being widely used in both manufacturing and in-the-field testing of processor-based devices and Systems-on-Chips. Unfortunately, the stuck-at fault model is increasingly inadequate to match the new and different types of defects in the most recent semiconductor technologies, while the explicit and separate targeting of every fault model in SBST is cumbersome due to the high complexity of the test-generation process, the lack of automation tools, and the high CPU-intensity of the fault-simulation process. Moreover, defects in advanced semiconductor technologies are not always covered by the most commonly used fault-models, and the probability of defect-escapes increases even more. To overcome these shortcomings we propose the first fault-independent SBST method. The proposed method is almost fully automated, it offers high coverage of non-modeled faults by means of a novel SBST-oriented probabilistic metric, and it is very fast as it omits the time-consuming test-generation/fault-simulation processes. Extensive experiments on the OpenRISC OR1200 processor show the advantages of the proposed method
Observation mechanisms for in-field software-based self-test
When electronic systems are used in safety critical applications, as in the space,
avionic, automotive or biomedical areas, it is required to maintain a very low
probability of failures due to faults of any kind. Standards and regulations play
a significant role, forcing companies to devise and adopt solutions able to achieve
predefined targets in terms of dependability. Different techniques can be used to
reduce fault occurrence or to minimize the probability that those faults produce
critical failures (e.g., by introducing redundancy).
Unfortunately, most of these techniques have a severe impact on the cost of
the resulting product and, in some cases, the probability of failures is too large
anyway. Hence, a solution commonly used in several scenarios lies on periodically
performing a test able to detect the occurrence of any fault before it produces
a failure (in-field test). This solution is normally based on forcing the processor
inside the Device Under Test to execute a properly written test program, which is
able to activate possible faults and to make their effects visible in some observable
locations. This approach is also called Software-Based Self-Test, or SBST.
If compared with testing in an end of manufacturing scenario, in-field testing
has strong limitations in terms of access to the system inputs and outputs
because Design for Testability structures and testing equipment are usually not
available. As a consequence there are reduced possibilities to activate the faults
and to observe their effects.
This reduced observability particularly affects the ability to detect performance
faults, i.e. faults that modify the timing but not the final value of computations.
This kind of faults are hard to detect by only observing the final content of
predefined memory locations, that is the usual test result observation method used
in-field.
Initially, the present work was focused on fault tolerance techniques against
transient faults induced by ionizing radiation, the so called Single Event Upsets
(SEUs). The main contribution of this early stage of the thesis lies in the experimental
validation of the feasibility of achieving a safe system by using an
architecture that combines task-level redundancy with already available IP cores,
thus minimizing the development time. Task execution is replicated and Memory
Protection is used to guarantee that any SEU may affect one and only one
of the replicas. A proof of concept implementation was developed and validated
using fault injection. Results outline the effectiveness of the architecture, and the
overhead analysis shows that the proposed architecture is effective in reducing the
resource occupation with respect to N-modular redundancy, at an affordable cost
in terms of application execution time.
The main part of the thesis is focused on in-field software-based self-test of
permanent faults. A set of observation methods exploiting existing or ad-hoc
hardware is proposed, aimed at obtaining a better coverage, in particular of performance
faults. An extensive quantitative evaluation of the proposed methods
is presented, including a comparison with the observation methods traditionally
used in end of manufacturing and in-field testing.
Results show that the proposed methods are a good complement to the traditionally
used final memory content observation. Moreover, they show that an
adequate combination of these complementary methods allows for achieving nearly
the same fault coverage achieved when continuously observing all the processor
outputs, which is an observation method commonly used for production test but
usually not available in-field.
A very interesting by-product of what is described above is a detailed description
of how to compute the fault coverage achieved by functional in-field tests
using a conventional fault simulator, a tool that is usually applied in an end of
manufacturing testing scenario.
Finally, another relevant result in the testing area is a method to detect permanent
faults inside the cache coherence logic integrated in each cache controller
of a multi-core system, based on the concurrent execution of a test program by
the different cores in a coordinated manner. By construction, the method achieves
full fault coverage of the static faults in the addressed logic.Cuando se utilizan sistemas electrónicos en aplicaciones críticas como en las áreas biomédica, aeroespacial o automotriz, se requiere mantener una muy baja probabilidad de malfuncionamientos debidos a cualquier tipo de fallas. Los estándares y normas juegan un papel importante, forzando a los desarrolladores a diseñar y adoptar soluciones que sean capaces de alcanzar objetivos predefinidos en cuanto a seguridad y confiabilidad. Pueden utilizarse diferentes técnicas para reducir la ocurrencia de fallas o para minimizar la probabilidad de que esas fallas produzcan mal funcionamientos críticos, por ejemplo a través de la incorporación de redundancia. Lamentablemente, muchas de esas técnicas afectan en gran medida el costo de los productos y, en algunos casos, la probabilidad de malfuncionamiento sigue siendo demasiado alta. En consecuencia, una solución usada a menudo en varios escenarios consiste en realizar periódicamente un test que sea capaz de detectar la ocurrencia de una falla antes de que esta produzca un mal funcionamiento (test en campo). En general, esta solución se basa en forzar a un procesador existente dentro del dispositivo bajo prueba a ejecutar un programa de test que sea capaz de activar las posibles fallas y de hacer que sus efectos sean visibles en puntos observables. A esta metodología también se la llama auto-test basado en software, o en inglés Software-Based Self-Test (SBST). Si se lo compara con un escenario de test de fin de fabricación, el test en campo tiene fuertes limitaciones en términos de posibilidad de acceso a las entradas y salidas del sistema, porque usualmente no se dispone de equipamiento de test ni de la infraestructura de Design for Testability. En consecuencia se tiene menos posibilidades de activar las fallas y de observar sus efectos. Esta observabilidad reducida afecta particularmente la habilidad para detectar fallas de performance, es decir fallas que modifican la temporización pero no el resultado final de los cálculos. Este tipo de fallas es difícil de detectar por la sola observación del contenido final de lugares de memoria, que es el método usual que se utiliza para observar los resultados de un test en campo. Inicialmente, el presente trabajo estuvo enfocado en técnicas para tolerar fallas transitorias inducidas por radiación ionizante, llamadas en inglés Single Event Upsets (SEUs). La principal contribución de esa etapa inicial de la tesis reside en la validación experimental de la viabilidad de obtener un sistema seguro, utilizando una arquitectura que combina redundancia a nivel de tareas con el uso de módulos hardware (IP cores) ya disponibles, que minimiza en consecuencia el tiempo de desarrollo. Se replica la ejecución de las tareas y se utiliza protección de memoria para garantizar que un SEU pueda afectar a lo sumo a una sola de las réplicas. Se desarrolló una implementación para prueba de concepto que fue validada mediante inyección de fallas. Los resultados muestran la efectividad de la arquitectura, y el análisis de los recursos utilizados muestra que la arquitectura propuesta es efectiva en reducir la ocupación con respecto a la redundancia modular con N réplicas, a un costo accesible en términos de tiempo de ejecución. La parte principal de esta tesis se enfoca en el área de auto-test en campo basado en software para la detección de fallas permanentes. Se propone un conjunto de métodos de observación utilizando hardware existente o ad-hoc, con el fin de obtener una mejor cobertura, en particular de las fallas de performance. Se presenta una extensa evaluación cuantitativa de los métodos propuestos, que incluye una comparación con los métodos tradicionalmente utilizados en tests de fin de fabricación y en campo. Los resultados muestran que los métodos propuestos son un buen complemento del método tradicionalmente usado que consiste en observar el valor final del contenido de memoria. Además muestran que una adecuada combinación de estos métodos complementarios permite alcanzar casi los mismos valores de cobertura de fallas que se obtienen mediante la observación continua de todas las salidas del procesador, método comúnmente usado en tests de fin de fabricación, pero que usualmente no está disponible en campo. Un subproducto muy interesante de lo arriba expuesto es la descripción detallada del procedimiento para calcular la cobertura de fallas lograda mediante tests funcionales en campo por medio de un simulador de fallas convencional, una herramienta que usualmente se aplica en escenarios de test de fin de fabricación. Finalmente, otro resultado relevante en el área de test es un método para detectar fallas permanentes dentro de la lógica de coherencia de cache que está integrada en el controlador de cache de cada procesador en un sistema multi procesador. El método está basado en la ejecución de un programa de test en forma coordinada por parte de los diferentes procesadores. Por construcción, el método cubre completamente las fallas de la lógica mencionad
Fault Detection Methodology for Caches in Reliable Modern VLSI Microprocessors based on Instruction Set Architectures
Η παρούσα διδακτορική διατριβή εισάγει μία χαμηλού κόστους μεθοδολογία για την
ανίχνευση ελαττωμάτων σε μικρές ενσωματωμένες κρυφές μνήμες που βασίζεται σε
σύγχρονες Αρχιτεκτονικές Συνόλου Εντολών και εφαρμόζεται με λογισμικό
αυτοδοκιμής. Η προτεινόμενη μεθοδολογία εφαρμόζει αλγορίθμους March μέσω
λογισμικού για την ανίχνευση τόσο ελαττωμάτων αποθήκευσης όταν εφαρμόζεται σε
κρυφές μνήμες που περιέχουν μόνο στατικές μνήμες τυχαίας προσπέλασης όπως για
παράδειγμα κρυφές μνήμες επιπέδου 1, όσο και ελαττωμάτων σύγκρισης όταν
εφαρμόζεται σε κρυφές μνήμες που περιέχουν εκτός από SRAM μνήμες και μνήμες
διευθυνσιοδοτούμενες μέσω περιεχομένου, όπως για παράδειγμα πλήρως
συσχετιστικές κρυφές μνήμες αναζήτησης μετάφρασης. Η προτεινόμενη μεθοδολογία
εφαρμόζεται και στις τρεις οργανώσεις συσχετιστικότητας κρυφής μνήμης και είναι
ανεξάρτητη της πολιτικής εγγραφής στο επόμενο επίπεδο της ιεραρχίας. Η
μεθοδολογία αξιοποιεί υπάρχοντες ισχυρούς μηχανισμούς των μοντέρνων ISAs
χρησιμοποιώντας ειδικές εντολές, που ονομάζονται στην παρούσα διατριβή Εντολές
Άμεσης Προσπέλασης Κρυφής Μνήμης (Direct Cache Access Instructions - DCAs).
Επιπλέον, η προτεινόμενη μεθοδολογία εκμεταλλεύεται τους έμφυτους μηχανισμούς
καταγραφής απόδοσης και τους μηχανισμούς χειρισμού παγίδων που είναι διαθέσιμοι
στους σύγχρονους επεξεργαστές. Επιπρόσθετα, η προτεινόμενη μεθοδολογία
εφαρμόζει την λειτουργία σύγκρισης των αλγορίθμων March όταν αυτή απαιτείται
(για μνήμες CAM) και επαληθεύει το αποτέλεσμα του ελέγχου μέσω σύντομης
απόκρισης, ώστε να είναι συμβατή με τις απαιτήσεις του ελέγχου εντός
λειτουργίας. Τέλος, στη διατριβή προτείνεται μία βελτιστοποίηση της
μεθοδολογίας για πολυνηματικές, πολυπύρηνες αρχιτεκτονικές.The present PhD thesis introduces a low cost fault detection methodology for
small embedded cache memories that is based on modern Instruction Set
Architectures and is applied with Software-Based Self-Test (SBST) routines. The
proposed methodology applies March tests through software to detect both
storage faults when applied to caches that comprise Static Random Access
Memories (SRAM) only, e.g. L1 caches, and comparison faults when applied to
caches that apart from SRAM memories comprise Content Addressable Memories
(CAM) too, e.g. Translation Lookaside Buffers (TLBs). The proposed methodology
can be applied to all three cache associativity organizations: direct mapped,
set-associative and full-associative and it does not depend on the cache write
policy. The methodology leverages existing powerful mechanisms of modern ISAs
by utilizing instructions that we call in this PhD thesis Direct Cache Access
(DCA) instructions. Moreover, our methodology exploits the native performance
monitoring hardware and the trap handling mechanisms which are available in
modern microprocessors. Moreover, the proposed Methodology applies March
compare operations when needed (for CAM arrays) and verifies the test result
with a compact response to comply with periodic on-line testing needs. Finally,
a multithreaded optimization of the proposed methodology that targets
multithreaded, multicore architectures is also presented in this thesi
Effectiveness of the addition of therapeutic alliance with minimal intervention in the treatment of patients with chronic, nonspecific low back pain and low risk of involvement of psychosocial factors: a study protocol for a randomized controlled trial (TalkBack trial)
SPIRIT 2013 Checklist: Recommended items to address in a clinical trial protocol and related documents*. (DOC 121 kb
New Techniques for On-line Testing and Fault Mitigation in GPUs
L'abstract è presente nell'allegato / the abstract is in the attachmen
- …