17 research outputs found
AFSM-based deterministic hardware TPG
This paper proposes a new approach for designing a cost-effective, on-chip, hardware pattern generator of deterministic test sequences. Given a pre-computed test pattern (obtained by an ATPG tool) with predetermined fault coverage, a hardware Test Pattern Generator (TPG) based on Autonomous Finite State Machines (AFSM) structure is synthesized to generate it. This new approach exploits "don't care" bits of the deterministic test patterns to lower area overhead of the TPG. Simulations using benchmark circuits show that the hardware components cost is considerably less when compared with alternative solution
Built-In Self Test (BIST) for Realistic Delay Defects
Testing of delay defects is necessary in deep submicron (DSM) technologies. High coverage delay tests produced by automatic test pattern generation (ATPG) can be applied during wafer and package tests, but are difficult to apply during the board test, due to limited chip access. Delay testing at the board level is increasingly important to diagnose failures caused by supply noise or temperature in the board environment. An alternative to ATPG is the built-in self test (BIST). In combination with the insertion of test points, BIST is able to achieve high coverage of stuck-at and transition faults. The quality of BIST patterns on small delay defects is an open question. In this work we analyze the application of BIST to small delay defects using resistive short and open models in order to estimate the coverage and correlate the coverage to traditional delay fault models
Recommended from our members
Efficient verification/testing of system-on-chip through fault grading and analog behavioral modeling
textThis dissertation presents several cost-effective production test solutions using fault grading and mixed-signal design verification cases enabled by analog behavioral modeling. Although the latest System-on-Chip (SOC) is getting denser, faster, and more complex, the manufacturing technology is dominated by subtle defects that are introduced by small-scale technology. Thus, SOC requires more mature testing strategies. By performing various types of testing, better quality SoC can be manufactured, but test resources are too limited to accommodate all those tests. To create the most efficient production test flow, any redundant or ineffective tests need to be removed or minimized.
Chapter 3 proposes new method of test data volume reduction by combining the nonlinear property of feedback shift register (FSR) and dictionary coding. Instead of using the nonlinear FSR for actual hardware implementation, the expanded test set by nonlinear expansion is used as the one-column test sets and provides big reduction ratio for the test data volume. The experimental results show the combined method reduced the total test data volume and increased the fault coverage. Due to the increased number of test patterns, total test time is increased.
Chapter 4 addresses a whole process of functional fault grading. Fault grading has always been a ”desire-to-have” flow because it can bring up significant value for cost saving and yield analysis. However, it is very hard to perform the fault grading on the complex large scale SOC. A commercial tool called Z01X is used as a fault grading platform, and whole fault grading process is coordinated and each detailed execution is performed. Simulation- based functional fault grading identifies the quality of the given functional tests against the static faults and transition delay faults. With the structural tests and functional tests, functional fault grading can indicate the way to achieve the same test coverage by spending minimal test time. Compared to the consumed time and resource for fault grading, the contribution to the test time saving might not be acceptable as very promising, but the fault grading data can be reused for yield analysis and test flow optimization. For the final production testing, confident decisions on the functional test selection can be made based on the fault grading results.
Chapter 5 addresses the challenges of Package-on-Package (POP) testing. Because POP devices have pins on both the top and the bottom of the package, the increased test pins require more test channels to detect packaging defects. Boundary scan chain testing is used to detect those continuity defects by relying on leakage current from the power supply. This proposed test scheme does not require direct test channels on the top pins. Based on the counting algorithm, minimal numbers of test cycles are generated, and the test achieved full test coverage for any combinations of pin-to-pin shortage defects on the top pins of the POP package. The experimental results show about 10 times increased leakage current from the shorted defect. Also, it can be expanded to multi-site testing with less test channels for high-volume production.
Fault grading is applied within different structural test categories in Chapter 6. Stuck-at faults can be considered as TDFs having infinite delay. Hence, the TDF Automatic Test Pattern Generation (ATPG) tests can detect both TDFs and stuck-at faults. By removing the stuck-at faults being detected by the given TDF ATPG tests, the tests that target stuck-at faults can be reduced, and the reduced stuck-at fault set results in fewer stuck-at ATPG patterns. The structural test time is reduced while keeping the same test coverage. This TDF grading is performed with the same ATPG tool used to generate the stuck-at and TDF ATPG tests.
To expedite the mixed-signal design verification of complex SoC, analog behavioral modeling methods and strategies are addressed in Chapter 7 and case studies for detailed verification with actual mixed-signal design are ad- dressed in Chapter 8. Analog modeling effort can enhance verification quality for a mixed-signal design with less turnaround time, and it enables compatible integration of the mixed-signal design cores into the SoC. The modeling process may reveal any potential design errors or incorrect testbench setup, and it results in minimizing unnecessary debugging time for quality devices.
Two mixed-signal design cases were verified by me using the analog models. A fully hierarchical digital-to-analog converter (DAC) model is implemented and silicon mismatches caused by process variation are modeled and inserted into the DAC model, and the calibration algorithm for the DAC is successfully verified by model-based simulation at the full DAC-level. When the mismatch amount is increased and exceeded the calibration capability of the DAC, the simulation results show the increased calibration error with some outliers. This verification method can identify the saturation range of the DAC and predict the yield of the devices from process variation.
A phase-locked loop (PLL) design cases were also verified by me using the analog model. Both open-loop PLL model and closed-loop PLL model cases are presented. Quick bring-up of open-loop PLL model provides low simulation overhead for widely-used PLLs in the SOC and enables early starting of design verification for the upper-level design using the PLL generated clocks. Accurate closed-loop PLL model is implemented for DCO-based PLL design, and the mixed-simulation with analog models and schematic designs enables flexible analog verification. Only focused analog design block is set to the schematic design and the rest of the analog design is replaced by the analog model. Then, this scaled-down SPICE simulation is performed about 10 times to 100 times faster than full-scale SPICE simulation. The analog model of the focused block is compared with the scaled-down SPICE simulation result and the quality of the model is iteratively enhanced. Hence, the analog model enables both compatible integration and flexible analog design verification.
This dissertation contributes to reduce test time and to enhance test quality, and helps to set up efficient production testing flows. Depending on the size and performance of CUT, proper testing schemes can maximize the efficiency of production testing. The topics covered in this dissertation can be used in optimizing the test flow and selecting the final production tests to achieve maximum test capability. In addition, the strategies and benefits of analog behavioral modeling techniques that I implemented are presented, and actual verification cases shows the effectiveness of analog modeling for better quality SoC products.Electrical and Computer Engineerin
Techniques for Improving Security and Trustworthiness of Integrated Circuits
The integrated circuit (IC) development process is becoming increasingly vulnerable to malicious activities because untrusted parties could be involved in this IC development flow. There are four typical problems that impact the security and trustworthiness of ICs used in military, financial, transportation, or other critical systems: (i) Malicious inclusions and alterations, known as hardware Trojans, can be inserted into a design by modifying the design during GDSII development and fabrication. Hardware Trojans in ICs may cause malfunctions, lower the reliability of ICs, leak confidential information to adversaries or even destroy the system under specifically designed conditions. (ii) The number of circuit-related counterfeiting incidents reported by component manufacturers has increased significantly over the past few years with recycled ICs contributing the largest percentage of the total reported counterfeiting incidents. Since these recycled ICs have been used in the field before, the performance and reliability of such ICs has been degraded by aging effects and harsh recycling process. (iii) Reverse engineering (RE) is process of extracting a circuit’s gate-level netlist, and/or inferring its functionality. The RE causes threats to the design because attackers can steal and pirate a design (IP piracy), identify the device technology, or facilitate other hardware attacks. (iv) Traditional tools for uniquely identifying devices are vulnerable to non-invasive or invasive physical attacks. Securing the ID/key is of utmost importance since leakage of even a single device ID/key could be exploited by an adversary to hack other devices or produce pirated devices. In this work, we have developed a series of design and test methodologies to deal with these four challenging issues and thus enhance the security, trustworthiness and reliability of ICs. The techniques proposed in this thesis include: a path delay fingerprinting technique for detection of hardware Trojans, recycled ICs, and other types counterfeit ICs including remarked, overproduced, and cloned ICs with their unique identifiers; a Built-In Self-Authentication (BISA) technique to prevent hardware Trojan insertions by untrusted fabrication facilities; an efficient and secure split manufacturing via Obfuscated Built-In Self-Authentication (OBISA) technique to prevent reverse engineering by untrusted fabrication facilities; and a novel bit selection approach for obtaining the most reliable bits for SRAM-based physical unclonable function (PUF) across environmental conditions and silicon aging effects
Heuristics Based Test Overhead Reduction Techniques in VLSI Circuits
The electronic industry has evolved at a mindboggling pace over the last five decades. Moore’s Law [1] has enabled the chip makers to push the limits of the physics to shrink the feature sizes on Silicon (Si) wafers over the years. A constant push for power-performance-area (PPA) optimization has driven the higher transistor density trends. The defect density in advanced process nodes has posed a challenge in achieving sustainable yield. Maintaining a low Defect-per-Million (DPM) target for a product to be viable with stringent Time-to-Market (TTM) has become one of the most important aspects of the chip manufacturing process. Design-for-Test (DFT) plays an instrumental role in enabling low DPM. DFT however impacts the PPA of a chip. This research describes an approach of minimizing the scan test overhead in a chip based on circuit topology heuristics. These heuristics are applied on a full-scan design to convert a subset of the scan flip-flops (SFF) into D flip-flops (DFF). The K Longest Path per Gate (KLPG) [2] automatic test pattern generation (ATPG) algorithm is used to generate tests for robust paths in the circuit. Observability driven multi cycle path generation [3][4] and test are used in this work to minimize coverage loss caused by the SFF conversion process. The presence of memory arrays in a design exacerbates the coverage loss due to the shadow cast by the array on its neighboring logic. A specialized behavioral modeling for the memory array is required to enable test coverage of the shadow logic. This work develops a memory model integrated into the ATPG engine for this purpose. Multiple clock domains pose challenges in the path generation process. The inter-domain clocking relationship and corresponding logic sensitization are modeled in our work to generate synchronous inter-domain paths over multiple clock cycles. Results are demonstrated on ISCAS89 and ITC99 benchmark circuits. Power saving benefit is quantified using an open-source standard-cell library
Runtime Monitoring for Dependable Hardware Design
Mit dem Voranschreiten der Technologieskalierung und der Globalisierung der Produktion von integrierten Schaltkreisen eröffnen sich eine Fülle von Schwachstellen bezüglich der Verlässlichkeit von Computerhardware. Jeder Mikrochip wird aufgrund von Produktionsschwankungen mit einem einzigartigen Charakter geboren, welcher sich durch seine Arbeitsbedingungen, Belastung und Umgebung in individueller Weise entwickelt. Daher sind deterministische Modelle, welche zur Entwurfszeit die Verlässlichkeit prognostizieren, nicht mehr ausreichend um Integrierte Schaltkreise mit Nanometertechnologie sinnvoll abbilden zu können. Der Bedarf einer Laufzeitanalyse des Zustandes steigt und mit ihm die notwendigen Maßnahmen zum Erhalt der Zuverlässigkeit.
Transistoren sind anfällig für auslastungsbedingte Alterung, die die Laufzeit der Schaltung erhöht und mit ihr die Möglichkeit einer Fehlberechnung. Hinzu kommen spezielle Abläufe die das schnelle Altern des Chips befördern und somit seine zuverlässige Lebenszeit reduzieren. Zusätzlich können strahlungsbedingte Laufzeitfehler (Soft-Errors) des Chips abnormales Verhalten kritischer Systeme verursachen. Sowohl das Ausbreiten als auch das Maskieren dieser Fehler wiederum sind abhängig von der Arbeitslast des Systems. Fabrizierten Chips können ebenfalls vorsätzlich während der Produktion boshafte Schaltungen, sogenannte Hardwaretrojaner, hinzugefügt werden. Dies kompromittiert die Sicherheit des Chips. Da diese Art der Manipulation vor ihrer Aktivierung kaum zu erfassen ist, ist der Nachweis von Trojanern auf einem Chip direkt nach der Produktion extrem schwierig.
Die Komplexität dieser Verlässlichkeitsprobleme machen ein einfaches Modellieren der Zuverlässigkeit und Gegenmaßnahmen ineffizient. Sie entsteht aufgrund verschiedener Quellen, eingeschlossen der Entwicklungsparameter (Technologie, Gerät, Schaltung und Architektur), der Herstellungsparameter, der Laufzeitauslastung und der Arbeitsumgebung. Dies motiviert das Erforschen von maschinellem Lernen und Laufzeitmethoden, welche potentiell mit dieser Komplexität arbeiten können.
In dieser Arbeit stellen wir Lösungen vor, die in der Lage sind, eine verlässliche Ausführung von Computerhardware mit unterschiedlichem Laufzeitverhalten und Arbeitsbedingungen zu gewährleisten. Wir entwickelten Techniken des maschinellen Lernens um verschiedene Zuverlässigkeitseffekte zu modellieren, zu überwachen und auszugleichen. Verschiedene Lernmethoden werden genutzt, um günstige Überwachungspunkte zur Kontrolle der Arbeitsbelastung zu finden. Diese werden zusammen mit Zuverlässigkeitsmetriken, aufbauend auf Ausfallsicherheit und generellen Sicherheitsattributen, zum Erstellen von Vorhersagemodellen genutzt. Des Weiteren präsentieren wir eine kosten-optimierte Hardwaremonitorschaltung, welche die Überwachungspunkte zur Laufzeit auswertet. Im Gegensatz zum aktuellen Stand der Technik, welcher mikroarchitektonische Überwachungspunkte ausnutzt, evaluieren wir das Potential von Arbeitsbelastungscharakteristiken auf der Logikebene der zugrundeliegenden Hardware. Wir identifizieren verbesserte Features auf Logikebene um feingranulare Laufzeitüberwachung zu ermöglichen. Diese Logikanalyse wiederum hat verschiedene Stellschrauben um auf höhere Genauigkeit und niedrigeren Overhead zu optimieren.
Wir untersuchten die Philosophie, Überwachungspunkte auf Logikebene mit Hilfe von Lernmethoden zu identifizieren und günstigen Monitore zu implementieren um eine adaptive Vorbeugung gegen statisches Altern, dynamisches Altern und strahlungsinduzierte Soft-Errors zu schaffen und zusätzlich die Aktivierung von Hardwaretrojanern zu erkennen.
DiesbezĂĽglich haben wir ein Vorhersagemodell entworfen, welches den Arbeitslasteinfluss auf alterungsbedingte Verschlechterungen des Chips mitverfolgt und dazu genutzt werden kann, dynamisch zur Laufzeit vorbeugende Techniken, wie Task-Mitigation, Spannungs- und Frequenzskalierung zu benutzen.
Dieses Vorhersagemodell wurde in Software implementiert, welche verschiedene Arbeitslasten aufgrund ihrer Alterungswirkung einordnet. Um die Widerstandsfähigkeit gegenüber beschleunigter Alterung sicherzustellen, stellen wir eine Überwachungshardware vor, welche einen Teil der kritischen Flip-Flops beaufsichtigt, nach beschleunigter Alterung Ausschau hält und davor warnt, wenn ein zeitkritischer Pfad unter starker Alterungsbelastung steht. Wir geben die Implementierung einer Technik zum Reduzieren der durch das Ausführen spezifischer Subroutinen auftretenden Belastung von zeitkritischen Pfaden. Zusätzlich schlagen wir eine Technik zur Abschätzung von online Soft-Error-Schwachstellen von Speicherarrays und Logikkernen vor, welche auf der Überwachung einer kleinen Gruppe Flip-Flops des Entwurfs basiert.
Des Weiteren haben wir eine Methode basierend auf Anomalieerkennung entwickelt, um Arbeitslastsignaturen von Hardwaretrojanern während deren Aktivierung zur Laufzeit zu erkennen und somit eine letzte Verteidigungslinie zu bilden. Basierend auf diesen Experimenten demonstriert diese Arbeit das Potential von fortgeschrittener Feature-Extraktion auf Logikebene und lernbasierter Vorhersage basierend auf Laufzeitdaten zur Verbesserung der Zuverlässigkeit von Harwareentwürfen
Quiescent current testing of CMOS data converters
Power supply quiescent current (IDDQ) testing has been very effective in VLSI circuits designed in CMOS processes detecting physical defects such as open and shorts and bridging defects. However, in sub-micron VLSI circuits, IDDQ is masked by the increased subthreshold (leakage) current of MOSFETs affecting the efficiency of I¬DDQ testing. In this work, an attempt has been made to perform robust IDDQ testing in presence of increased leakage current by suitably modifying some of the test methods normally used in industry. Digital CMOS integrated circuits have been tested successfully using IDDQ and IDDQ methods for physical defects. However, testing of analog circuits is still a problem due to variation in design from one specific application to other. The increased leakage current further complicates not only the design but also testing. Mixed-signal integrated circuits such as the data converters are even more difficult to test because both analog and digital functions are built on the same substrate. We have re-examined both IDDQ and IDDQ methods of testing digital CMOS VLSI circuits and added features to minimize the influence of leakage current. We have designed built-in current sensors (BICS) for on-chip testing of analog and mixed-signal integrated circuits. We have also combined quiescent current testing with oscillation and transient current test techniques to map large number of manufacturing defects on a chip. In testing, we have used a simple method of injecting faults simulating manufacturing defects invented in our VLSI research group. We present design and testing of analog and mixed-signal integrated circuits with on-chip BICS such as an operational amplifier, 12-bit charge scaling architecture based digital-to-analog converter (DAC), 12-bit recycling architecture based analog-to-digital converter (ADC) and operational amplifier with floating gate inputs. The designed circuits are fabricated in 0.5 μm and 1.5 μm n-well CMOS processes and tested. Experimentally observed results of the fabricated devices are compared with simulations from SPICE using MOS level 3 and BSIM3.1 model parameters for 1.5 μm and 0.5 μm n-well CMOS technologies, respectively. We have also explored the possibility of using noise in VLSI circuits for testing defects and present the method we have developed
Innovative Techniques for Testing and Diagnosing SoCs
We rely upon the continued functioning of many electronic devices for our everyday welfare,
usually embedding integrated circuits that are becoming even cheaper and smaller
with improved features. Nowadays, microelectronics can integrate a working computer
with CPU, memories, and even GPUs on a single die, namely System-On-Chip (SoC).
SoCs are also employed on automotive safety-critical applications, but need to be tested
thoroughly to comply with reliability standards, in particular the ISO26262 functional
safety for road vehicles.
The goal of this PhD. thesis is to improve SoC reliability by proposing innovative
techniques for testing and diagnosing its internal modules: CPUs, memories, peripherals,
and GPUs. The proposed approaches in the sequence appearing in this thesis are described
as follows:
1. Embedded Memory Diagnosis: Memories are dense and complex circuits which
are susceptible to design and manufacturing errors. Hence, it is important to understand
the fault occurrence in the memory array. In practice, the logical and physical
array representation differs due to an optimized design which adds enhancements to
the device, namely scrambling. This part proposes an accurate memory diagnosis
by showing the efforts of a software tool able to analyze test results, unscramble
the memory array, map failing syndromes to cell locations, elaborate cumulative
analysis, and elaborate a final fault model hypothesis. Several SRAM memory failing
syndromes were analyzed as case studies gathered on an industrial automotive
32-bit SoC developed by STMicroelectronics. The tool displayed defects virtually,
and results were confirmed by real photos taken from a microscope.
2. Functional Test Pattern Generation: The key for a successful test is the pattern applied
to the device. They can be structural or functional; the former usually benefits
from embedded test modules targeting manufacturing errors and is only effective
before shipping the component to the client. The latter, on the other hand, can be
applied during mission minimally impacting on performance but is penalized due
to high generation time. However, functional test patterns may benefit for having
different goals in functional mission mode. Part III of this PhD thesis proposes
three different functional test pattern generation methods for CPU cores embedded
in SoCs, targeting different test purposes, described as follows:
a. Functional Stress Patterns: Are suitable for optimizing functional stress during
I
Operational-life Tests and Burn-in Screening for an optimal device reliability
characterization
b. Functional Power Hungry Patterns: Are suitable for determining functional
peak power for strictly limiting the power of structural patterns during manufacturing
tests, thus reducing premature device over-kill while delivering high test
coverage
c. Software-Based Self-Test Patterns: Combines the potentiality of structural patterns
with functional ones, allowing its execution periodically during mission.
In addition, an external hardware communicating with a devised SBST was proposed.
It helps increasing in 3% the fault coverage by testing critical Hardly
Functionally Testable Faults not covered by conventional SBST patterns.
An automatic functional test pattern generation exploiting an evolutionary algorithm
maximizing metrics related to stress, power, and fault coverage was employed
in the above-mentioned approaches to quickly generate the desired patterns. The
approaches were evaluated on two industrial cases developed by STMicroelectronics;
8051-based and a 32-bit Power Architecture SoCs. Results show that generation
time was reduced upto 75% in comparison to older methodologies while
increasing significantly the desired metrics.
3. Fault Injection in GPGPU: Fault injection mechanisms in semiconductor devices
are suitable for generating structural patterns, testing and activating mitigation techniques,
and validating robust hardware and software applications. GPGPUs are
known for fast parallel computation used in high performance computing and advanced
driver assistance where reliability is the key point. Moreover, GPGPU manufacturers
do not provide design description code due to content secrecy. Therefore,
commercial fault injectors using the GPGPU model is unfeasible, making radiation
tests the only resource available, but are costly. In the last part of this thesis, we
propose a software implemented fault injector able to inject bit-flip in memory elements
of a real GPGPU. It exploits a software debugger tool and combines the
C-CUDA grammar to wisely determine fault spots and apply bit-flip operations in
program variables. The goal is to validate robust parallel algorithms by studying
fault propagation or activating redundancy mechanisms they possibly embed. The
effectiveness of the tool was evaluated on two robust applications: redundant parallel
matrix multiplication and floating point Fast Fourier Transform
Characterization of Interconnection Delays in FPGAS Due to Single Event Upsets and Mitigation
RÉSUMÉ L’utilisation incessante de composants électroniques à géométrie toujours plus faible a engendré de nouveaux défis au fil des ans. Par exemple, des semi-conducteurs à mémoire et à microprocesseur plus avancés sont utilisés dans les systèmes avioniques qui présentent une susceptibilité importante aux phénomènes de rayonnement cosmique. L'une des principales implications des rayons cosmiques, observée principalement dans les satellites en orbite, est l'effet d'événements singuliers (SEE). Le rayonnement atmosphérique suscite plusieurs préoccupations concernant la sécurité et la fiabilité de l'équipement avionique, en particulier pour les systèmes qui impliquent des réseaux de portes programmables (FPGA). Les FPGA à base de cellules de mémoire statique (SRAM) présentent une solution attrayante pour mettre en oeuvre des systèmes complexes dans le domaine de l’avionique. Les expériences de rayonnement réalisées sur les FPGA ont dévoilé la vulnérabilité de ces dispositifs contre un type particulier de SEE, à savoir, les événements singuliers de changement d’état (SEU). Un SEU est considérée comme le changement de l'état d'un élément bistable (c'est-à -dire, un bit-flip) dû à l'effet d'un ion, d'un proton ou d’un neutron énergétique. Cet effet est non destructif et peut être corrigé en réécrivant la partie de la SRAM affectée.
Les changements de délai (DC) potentiels dus aux SEU affectant la mémoire de configuration de routage ont été récemment confirmés. Un des objectifs de cette thèse consiste à caractériser plus précisément les DC dans les FPGA causés par les SEU. Les DC observés expérimentalement sont présentés et la modélisation au niveau circuit de ces DC est proposée. Les circuits impliqués dans la propagation du délai sont validés en effectuant une modélisation précise des blocs internes à l'intérieur du FPGA et en exécutant des simulations. Les résultats montrent l’origine des DC qui sont en accord avec les mesures expérimentales de délais. Les modèles proposés au niveau circuit sont, aux meilleures de notre connaissance, le premier travail qui confirme et explique les délais combinatoires dans les FPGA.
La conception d'un circuit moniteur de délai pour la détection des DC a été faite dans la deuxième partie de cette thèse. Ce moniteur permet de détecter un changement de délai sur les sections critiques du circuit et de prévenir les pannes de synchronisation engendrées par les SEU sans utiliser la redondance modulaire triple (TMR).----------ABSTRACT
The unrelenting demand for electronic components with ever diminishing feature size have emerged new challenges over the years. Among them, more advanced memory and microprocessor semiconductors are being used in avionic systems that exhibit a substantial susceptibility to cosmic radiation phenomena. One of the main implications of cosmic rays, which was primarily observed in orbiting satellites, is single-event effect (SEE). Atmospheric radiation causes several concerns regarding the safety and reliability of avionics equipment, particularly for systems that involve field programmable gate arrays (FPGA). SRAM-based FPGAs, as an attractive solution to implement systems in aeronautic sector, are very susceptible to SEEs in particular Single Event Upset (SEU). An SEU is considered as the change of the state of a bistable element (i.e., bit-flip) due to the effect of an energetic ion or proton. This effect is non-destructive and may be fixed by rewriting the affected part.
Sensitivity evaluation of SRAM-based FPGAs to a physical impact such as potential delay changes (DC) has not been addressed thus far in the literature. DCs induced by SEU can affect the functionality of the logic circuits by disturbing the race condition on critical paths. The objective of this thesis is toward the characterization of DCs in SRAM-based FPGAs due to transient ionizing radiation. The DCs observed experimentally are presented and the circuit-level modeling of those DCs is proposed. Circuits involved in delay propagation are reverse-engineered by performing precise modeling of internal blocks inside the FPGA and executing simulations. The results show the root cause of DCs that are in good agreement with experimental delay measurements. The proposed circuit level models are, to the best of our knowledge, the first work on modeling of combinational delays in FPGAs.In addition, the design of a delay monitor circuit for DC detection is investigated in the second part of this thesis. This monitor allowed to show experimentally cumulative DCs on interconnects in FPGA. To this end, by avoiding the use of triple modular redundancy (TMR), a mitigation technique for DCs is proposed and the system downtime is minimized. A method is also proposed to decrease the clock frequency after DC detection without interrupting the process
Multi-level simulation of nano-electronic digital circuits on GPUs
Simulation of circuits and faults is an essential part in design and test validation tasks of contemporary nano-electronic digital integrated CMOS circuits.
Shrinking technology processes with smaller feature sizes and strict performance and reliability requirements demand not only detailed validation of the functional properties of a design, but also accurate validation of non-functional aspects including the timing behavior. However, due to the rising complexity of the circuit behavior and the steady growth of the designs with respect to the transistor count, timing-accurate simulation of current designs requires a lot of computational effort which can only be handled by proper abstraction and a high degree of parallelization.
This work presents a simulation model for scalable and accurate timing simulation of digital circuits on data-parallel graphics processing unit (GPU) accelerators.
By providing compact modeling and data-structures as well as through exploiting multiple dimensions of parallelism, the simulation model enables not only fast and timing-accurate simulation at logic level, but also massively-parallel simulation with switch level accuracy.
The model facilitates extensions for fast and efficient fault simulation of small delay faults at logic level, as well as first-order parametric and parasitic faults at switch level.
With the parallelization on GPUs, detailed and scalable simulation is enabled that is applicable even to multi-million gate designs.
This way, comprehensive analyses of realistic timing-related faults in presence of process- and parameter variations are enabled for the first time.
Additional simulation efficiency is achieved by merging the presented methods in a unified simulation model, that allows to combine the unique advantages of the different levels of abstraction in a mixed-abstraction multi-level simulation flow to reach even higher speedups.
Experimental results show that the implemented parallel approach achieves unprecedented simulation throughput as well as high speedup compared to conventional timing simulators.
The underlying model scales for multi-million gate designs and gives detailed insights into the timing behavior of digital CMOS circuits, thereby enabling large-scale applications to aid even highly complex design and test validation tasks