37 research outputs found

    Studying the effects of intermittent faults on a microcontroller

    Full text link
    As CMOS technology scales to the nanometer range, designers have to deal with a growing number and variety of fault types. Particularly, intermittent faults are expected to be an important issue in modern VLSI circuits. The complexity of manufacturing processes, producing residues and parameter variations, together with special aging mechanisms, may increase the presence of such faults. This work presents a case study of the impact of intermittent faults on the behavior of a commercial microcontroller. In order to carry out an exhaustive reliability assessment, the methodology used lies in VHDL-based fault injection technique. In this way, a set of intermittent fault models at logic and register transfer abstraction levels have been generated and injected in the VHDL model of the system. From the simulation traces, the occurrences of failures and latent errors have been logged. The impact of intermittent faults has been also compared to that got when injecting transient and permanent faults. Finally, some injection experiments have been reproduced in a RISC microprocessor and compared with those of the microcontroller. © 2012 Elsevier Ltd. All rights reserved.This work has been funded by the Spanish Government under the Research Project TIN2009-13825.Gil Tomás, DA.; Gracia-Morán, J.; Baraza Calvo, JC.; Saiz-Adalid, L.; Gil Vicente, PJ. (2012). Studying the effects of intermittent faults on a microcontroller. Microelectronics Reliability. 52(11):2837-2846. https://doi.org/10.1016/j.microrel.2012.06.004S28372846521

    New Fault Detection, Mitigation and Injection Strategies for Current and Forthcoming Challenges of HW Embedded Designs

    Full text link
    Tesis por compendio[EN] Relevance of electronics towards safety of common devices has only been growing, as an ever growing stake of the functionality is assigned to them. But of course, this comes along the constant need for higher performances to fulfill such functionality requirements, while keeping power and budget low. In this scenario, industry is struggling to provide a technology which meets all the performance, power and price specifications, at the cost of an increased vulnerability to several types of known faults or the appearance of new ones. To provide a solution for the new and growing faults in the systems, designers have been using traditional techniques from safety-critical applications, which offer in general suboptimal results. In fact, modern embedded architectures offer the possibility of optimizing the dependability properties by enabling the interaction of hardware, firmware and software levels in the process. However, that point is not yet successfully achieved. Advances in every level towards that direction are much needed if flexible, robust, resilient and cost effective fault tolerance is desired. The work presented here focuses on the hardware level, with the background consideration of a potential integration into a holistic approach. The efforts in this thesis have focused several issues: (i) to introduce additional fault models as required for adequate representativity of physical effects blooming in modern manufacturing technologies, (ii) to provide tools and methods to efficiently inject both the proposed models and classical ones, (iii) to analyze the optimum method for assessing the robustness of the systems by using extensive fault injection and later correlation with higher level layers in an effort to cut development time and cost, (iv) to provide new detection methodologies to cope with challenges modeled by proposed fault models, (v) to propose mitigation strategies focused towards tackling such new threat scenarios and (vi) to devise an automated methodology for the deployment of many fault tolerance mechanisms in a systematic robust way. The outcomes of the thesis constitute a suite of tools and methods to help the designer of critical systems in his task to develop robust, validated, and on-time designs tailored to his application.[ES] La relevancia que la electrónica adquiere en la seguridad de los productos ha crecido inexorablemente, puesto que cada vez ésta copa una mayor influencia en la funcionalidad de los mismos. Pero, por supuesto, este hecho viene acompañado de una necesidad constante de mayores prestaciones para cumplir con los requerimientos funcionales, al tiempo que se mantienen los costes y el consumo en unos niveles reducidos. En este escenario, la industria está realizando esfuerzos para proveer una tecnología que cumpla con todas las especificaciones de potencia, consumo y precio, a costa de un incremento en la vulnerabilidad a múltiples tipos de fallos conocidos o la introducción de nuevos. Para ofrecer una solución a los fallos nuevos y crecientes en los sistemas, los diseñadores han recurrido a técnicas tradicionalmente asociadas a sistemas críticos para la seguridad, que ofrecen en general resultados sub-óptimos. De hecho, las arquitecturas empotradas modernas ofrecen la posibilidad de optimizar las propiedades de confiabilidad al habilitar la interacción de los niveles de hardware, firmware y software en el proceso. No obstante, ese punto no está resulto todavía. Se necesitan avances en todos los niveles en la mencionada dirección para poder alcanzar los objetivos de una tolerancia a fallos flexible, robusta, resiliente y a bajo coste. El trabajo presentado aquí se centra en el nivel de hardware, con la consideración de fondo de una potencial integración en una estrategia holística. Los esfuerzos de esta tesis se han centrado en los siguientes aspectos: (i) la introducción de modelos de fallo adicionales requeridos para la representación adecuada de efectos físicos surgentes en las tecnologías de manufactura actuales, (ii) la provisión de herramientas y métodos para la inyección eficiente de los modelos propuestos y de los clásicos, (iii) el análisis del método óptimo para estudiar la robustez de sistemas mediante el uso de inyección de fallos extensiva, y la posterior correlación con capas de más alto nivel en un esfuerzo por recortar el tiempo y coste de desarrollo, (iv) la provisión de nuevos métodos de detección para cubrir los retos planteados por los modelos de fallo propuestos, (v) la propuesta de estrategias de mitigación enfocadas hacia el tratamiento de dichos escenarios de amenaza y (vi) la introducción de una metodología automatizada de despliegue de diversos mecanismos de tolerancia a fallos de forma robusta y sistemática. Los resultados de la presente tesis constituyen un conjunto de herramientas y métodos para ayudar al diseñador de sistemas críticos en su tarea de desarrollo de diseños robustos, validados y en tiempo adaptados a su aplicación.[CA] La rellevància que l'electrònica adquireix en la seguretat dels productes ha crescut inexorablement, puix cada volta més aquesta abasta una major influència en la funcionalitat dels mateixos. Però, per descomptat, aquest fet ve acompanyat d'un constant necessitat de majors prestacions per acomplir els requeriments funcionals, mentre es mantenen els costos i consums en uns nivells reduïts. Donat aquest escenari, la indústria està fent esforços per proveir una tecnologia que complisca amb totes les especificacions de potència, consum i preu, tot a costa d'un increment en la vulnerabilitat a diversos tipus de fallades conegudes, i a la introducció de nous tipus. Per oferir una solució a les noves i creixents fallades als sistemes, els dissenyadors han recorregut a tècniques tradicionalment associades a sistemes crítics per a la seguretat, que en general oferixen resultats sub-òptims. De fet, les arquitectures empotrades modernes oferixen la possibilitat d'optimitzar les propietats de confiabilitat en habilitar la interacció dels nivells de hardware, firmware i software en el procés. Tot i això eixe punt no està resolt encara. Es necessiten avanços a tots els nivells en l'esmentada direcció per poder assolir els objectius d'una tolerància a fallades flexible, robusta, resilient i a baix cost. El treball ací presentat se centra en el nivell de hardware, amb la consideració de fons d'una potencial integració en una estratègia holística. Els esforços d'esta tesi s'han centrat en els següents aspectes: (i) la introducció de models de fallada addicionals requerits per a la representació adequada d'efectes físics que apareixen en les tecnologies de fabricació actuals, (ii) la provisió de ferramentes i mètodes per a la injecció eficient del models proposats i dels clàssics, (iii) l'anàlisi del mètode òptim per estudiar la robustesa de sistemes mitjançant l'ús d'injecció de fallades extensiva, i la posterior correlació amb capes de més alt nivell en un esforç per retallar el temps i cost de desenvolupament, (iv) la provisió de nous mètodes de detecció per cobrir els reptes plantejats pels models de fallades proposats, (v) la proposta d'estratègies de mitigació enfocades cap al tractament dels esmentats escenaris d'amenaça i (vi) la introducció d'una metodologia automatitzada de desplegament de diversos mecanismes de tolerància a fallades de forma robusta i sistemàtica. Els resultats de la present tesi constitueixen un conjunt de ferramentes i mètodes per ajudar el dissenyador de sistemes crítics en la seua tasca de desenvolupament de dissenys robustos, validats i a temps adaptats a la seua aplicació.Espinosa García, J. (2016). New Fault Detection, Mitigation and Injection Strategies for Current and Forthcoming Challenges of HW Embedded Designs [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/73146TESISCompendi

    Effects of intermittent faults on the reliability of a Reduced Instruction Set Computing (RISC) microprocessor

    Full text link
    © 2014 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.With the scaling of complementary metal-oxide-semiconductor (CMOS) technology to the submicron range, designers have to deal with a growing number and variety of fault types. In this way, intermittent faults are gaining importance in modern very large scale integration (VLSI) circuits. The presence of these faults is increasing due to the complexity of manufacturing processes (which produce residues and parameter variations), together with special aging mechanisms. This work presents a case study of the impact of intermittent faults on the behavior of a reduced instruction set computing (RISC) microprocessor. We have carried out an exhaustive reliability assessment by using very-high-speed-integrated-circuit hardware description language (VHDL)-based fault injection. In this way, we have been able to modify different intermittent fault parameters, to select various targets, and even, to compare the impact of intermittent faults with those induced by transient and permanent faults.This work was supported by the Spanish Government under the Research Project TIN2009-13825 and by the Universitat Politecnica de Valencia under the Project SP20120806. Associate Editor: L. Cui.Gracia-Morán, J.; Baraza Calvo, JC.; Gil Tomás, DA.; Saiz-Adalid, L.; Gil, P. (2014). Effects of intermittent faults on the reliability of a Reduced Instruction Set Computing (RISC) microprocessor. IEEE Transactions on Reliability. 63(1):144-153. https://doi.org/10.1109/TR.2014.2299711S14415363

    Enhancement of fault injection techniques based on the modification of VHDL code

    Full text link
    Deep submicrometer devices are expected to be increasingly sensitive to physical faults. For this reason, fault-tolerance mechanisms are more and more required in VLSI circuits. So, validating their dependability is a prior concern in the design process. Fault injection techniques based on the use of hardware description languages offer important advantages with regard to other techniques. First, as this type of techniques can be applied during the design phase of the system, they permit reducing the time-to-market. Second, they present high controllability and reachability. Among the different techniques, those based on the use of saboteurs and mutants are especially attractive due to their high fault modeling capability. However, implementing automatically these techniques in a fault injection tool is difficult. Especially complex are the insertion of saboteurs and the generation of mutants. In this paper, we present new proposals to implement saboteurs and mutants for models in VHDL which are easy-to-automate, and whose philosophy can be generalized to other hardware description languages.Baraza Calvo, JC.; Gracia-Morán, J.; Blanc Clavero, S.; Gil Tomás, DA.; Gil Vicente, PJ. (2008). Enhancement of fault injection techniques based on the modification of VHDL code. IEEE Transactions on Very Large Scale Integration (VLSI) Systems. 16(6):693-706. doi:10.1109/TVLSI.2008.2000254S69370616

    Intermittent resistive faults in digital cmos circuits

    Get PDF
    A major threat in extremely dependable high-end process node integrated systems in e.g. Avionics are no failures found (NFF). One category of NFFs is the intermittent resistive fault, often originating from bad (e.g. Via or TSV-based) interconnections. This paper will show the impact of these faults on the behavior of a digital CMOS circuit via simulation. As the occurrence rate of this kind of defects can take e.g. One month, while the duration of the defect can be as short as 50 nanoseconds, to evoke and detect these faults is a huge scientific challenge. An on-chip data logging system with time stamp and stored environmental conditions, along with the detection, will drastically improve the task of maintenance of avionics and reduce the current high debugging costs

    Speeding-up model-based fault injection of deep-submicron CMOS fault models through dynamic and partially reconfigurable FPGAS

    Full text link
    Actualmente, las tecnologías CMOS submicrónicas son básicas para el desarrollo de los modernos sistemas basados en computadores, cuyo uso simplifica enormemente nuestra vida diaria en una gran variedad de entornos, como el gobierno, comercio y banca electrónicos, y el transporte terrestre y aeroespacial. La continua reducción del tamaño de los transistores ha permitido reducir su consumo y aumentar su frecuencia de funcionamiento, obteniendo por ello un mayor rendimiento global. Sin embargo, estas mismas características que mejoran el rendimiento del sistema, afectan negativamente a su confiabilidad. El uso de transistores de tamaño reducido, bajo consumo y alta velocidad, está incrementando la diversidad de fallos que pueden afectar al sistema y su probabilidad de aparición. Por lo tanto, existe un gran interés en desarrollar nuevas y eficientes técnicas para evaluar la confiabilidad, en presencia de fallos, de sistemas fabricados mediante tecnologías submicrónicas. Este problema puede abordarse por medio de la introducción deliberada de fallos en el sistema, técnica conocida como inyección de fallos. En este contexto, la inyección basada en modelos resulta muy interesante, ya que permite evaluar la confiabilidad del sistema en las primeras etapas de su ciclo de desarrollo, reduciendo por tanto el coste asociado a la corrección de errores. Sin embargo, el tiempo de simulación de modelos grandes y complejos imposibilita su aplicación en un gran número de ocasiones. Esta tesis se centra en el uso de dispositivos lógicos programables de tipo FPGA (Field-Programmable Gate Arrays) para acelerar los experimentos de inyección de fallos basados en simulación por medio de su implementación en hardware reconfigurable. Para ello, se extiende la investigación existente en inyección de fallos basada en FPGA en dos direcciones distintas: i) se realiza un estudio de las tecnologías submicrónicas existentes para obtener un conjunto representativo de modelos de fallos transitoriosAndrés Martínez, DD. (2007). Speeding-up model-based fault injection of deep-submicron CMOS fault models through dynamic and partially reconfigurable FPGAS [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/1943Palanci

    Fault-tolerant satellite computing with modern semiconductors

    Get PDF
    Miniaturized satellites enable a variety space missions which were in the past infeasible, impractical or uneconomical with traditionally-designed heavier spacecraft. Especially CubeSats can be launched and manufactured rapidly at low cost from commercial components, even in academic environments. However, due to their low reliability and brief lifetime, they are usually not considered suitable for life- and safety-critical services, complex multi-phased solar-system-exploration missions, and missions with a longer duration. Commercial electronics are key to satellite miniaturization, but also responsible for their low reliability: Until 2019, there existed no reliable or fault-tolerant computer architectures suitable for very small satellites. To overcome this deficit, a novel on-board-computer architecture is described in this thesis.Robustness is assured without resorting to radiation hardening, but through software measures implemented within a robust-by-design multiprocessor-system-on-chip. This fault-tolerant architecture is component-wise simple and can dynamically adapt to changing performance requirements throughout a mission. It can support graceful aging by exploiting FPGA-reconfiguration and mixed-criticality.  Experimentally, we achieve 1.94W power consumption at 300Mhz with a Xilinx Kintex Ultrascale+ proof-of-concept, which is well within the powerbudget range of current 2U CubeSats. To our knowledge, this is the first COTS-based, reproducible on-board-computer architecture that can offer strong fault coverage even for small CubeSats.European Space AgencyComputer Systems, Imagery and Medi

    Fallos intermitentes: análisis de causas y efectos, nuevos modelos de fallos y técnicas de mitigación

    Full text link
    [EN] From the first integrated circuit was developed to very large scale integration (VLSI) technology, the hardware of computer systems has had an immense evolution. Moore's Law, which predicts that the number of transistors that can be integrated on a chip doubles every year, has been accomplished for decades thanks to the aggressive reduction of transistors size. This has allowed increasing its frequency, achieving higher performance with lower consumption, but at the expense of a reliability penalty. The number of defects are raising due to variations in the increasingly complex manufacturing process. Intermittent faults, one of the fundamental issues affecting the reliability of current and future digital VLSI circuits technologies, are studied in this thesis. In the past, intermittent faults have been considered the prelude to permanent faults. Nowadays, the occurrence of intermittent faults caused by variations in the manufacturing process not affecting permanently has increased. Errors induced by intermittent and transient faults manifest similarly, although intermittent faults are usually grouped in bursts and they are activated repeatedly and non-deterministically in the same place. In addition, intermittent faults can be activated and deactivated by changes in temperature, voltage and frequency. In this thesis, the effects of intermittent faults in digital systems have been analyzed by using simulation-based fault injection. This methodology allows introducing faults in a controlled manner. After an extensive literature review to understand the physical mechanisms of intermittent faults, new intermittent fault models at gate and register transfer levels have been proposed. These new fault models have been used to analyze the effects of intermittent faults in different microprocessors models, as well as the influence of several parameters. To mitigate these effects, various fault tolerance techniques have been studied in this thesis, in order to determine whether they are suitable to tolerate intermittent faults. Results show that the error detection mechanisms work properly, but the error recovery mechanisms need to be improved. Error correction codes (ECC) is a well-known fault tolerance technique. This thesis proposes a new family of ECCs specially designed to tolerate faults when the fault rate is not equal in all bits in a word, such as in the presence of intermittent faults. As these faults may also present a fault rate variable along time, a fault tolerance mechanism whose behavior adapts to the temporal evolution of error conditions can use the new ECCs proposed.[ES] Desde la invención del primer circuito integrado hasta la tecnología de muy alta escala de integración (VLSI), el hardware de los sistemas informáticos ha evolucionado enormemente. La Ley de Moore, que vaticina que el número de transistores que se pueden integrar en un chip se duplica cada año, se ha venido cumpliendo durante décadas gracias a la agresiva reducción del tamaño de los transistores. Esto ha permitido aumentar su frecuencia de trabajo, logrando mayores prestaciones con menor consumo, pero a costa de penalizar la confiabilidad, ya que aumentan los defectos producidos por variaciones en el cada vez más complejo proceso de fabricación. En la presente tesis se aborda el estudio de uno de los problemas fundamentales que afectan a la confiabilidad en las actuales y futuras tecnologías de circuitos integrados digitales VLSI: los fallos intermitentes. En el pasado, los fallos intermitentes se consideraban el preludio de fallos permanentes. En la actualidad, ha aumentado la aparición de fallos intermitentes provocados por variaciones en el proceso de fabricación que no afectan permanentemente. Los errores inducidos por fallos intermitentes se manifiestan de forma similar a los provocados por fallos transitorios, salvo que los fallos intermitentes suelen agruparse en ráfagas y se activan repetitivamente y de forma no determinista en el mismo lugar. Además, los fallos intermitentes se pueden activar y desactivar por cambios de temperatura, tensión y frecuencia. En esta tesis se han analizado los efectos de los fallos intermitentes en sistemas digitales utilizando inyección de fallos basada en simulación, que permite introducir fallos en el sistema de forma controlada. Tras un amplio estudio bibliográfico para entender los mecanismos físicos de los fallos intermitentes, se han propuesto nuevos modelos de fallo en los niveles de puerta lógica y de transferencia de registros, que se han utilizado para analizar los efectos de los fallos intermitentes y la influencia de diversos factores. Para mitigar esos efectos, en esta tesis se han estudiado distintas técnicas de tolerancia a fallos, con el objetivo de determinar si son adecuadas para tolerar fallos intermitentes, ya que las técnicas existentes están generalmente diseñadas para tolerar fallos transitorios o permanentes. Los resultados muestran que los mecanismos de detección funcionan adecuadamente, pero hay que mejorar los de recuperación. Una técnica de tolerancia a fallos existente son los códigos correctores de errores (ECC). Esta tesis propone nuevos ECC diseñados para tolerar fallos cuando su tasa no es la misma en todos los bits de una palabra, como en el caso de los fallos intermitentes. Éstos, además, pueden presentar una tasa de fallo variable en el tiempo, por lo que sería necesario un mecanismo de tolerancia a fallos cuyo comportamiento se adapte a la evolución temporal de las condiciones de error, y que utilice los nuevos ECC propuestos.[CA] Des de la invenció del primer circuit integrat fins a la tecnologia de molt alta escala d'integració (VLSI), el maquinari dels sistemes informàtics ha evolucionat enormement. La Llei de Moore, que vaticina que el nombre de transistors que es poden integrar en un xip es duplica cada any, s'ha vingut complint durant dècades gràcies a l'agressiva reducció de la mida dels transistors. Això ha permès augmentar la seua freqüència de treball, aconseguint majors prestacions amb menor consum, però a costa de penalitzar la fiabilitat, ja que augmenten els defectes produïts per variacions en el cada vegada més complex procés de fabricació. En la present tesi s'aborda l'estudi d'un dels problemes fonamentals que afecten la fiabilitat en les actuals i futures tecnologies de circuits integrats digitals VLSI: les fallades intermitents. En el passat, les fallades intermitents es consideraven el preludi de fallades permanents. En l'actualitat, ha augmentat l'aparició de fallades intermitents provocades per variacions en el procés de fabricació que no afecten permanentment. Els errors induïts per fallades intermitents es manifesten de forma similar als provocats per fallades transitòries, llevat que les fallades intermitents solen agrupar-se en ràfegues i s'activen repetidament i de forma no determinista en el mateix lloc. A més, les fallades intermitents es poden activar i desactivar per canvis de temperatura, tensió i freqüència. En aquesta tesi s'han analitzat els efectes de les fallades intermitents en sistemes digitals utilitzant injecció de fallades basada en simulació, que permet introduir errors en el sistema de forma controlada. Després d'un ampli estudi bibliogràfic per entendre els mecanismes físics de les fallades intermitents, s'han proposat nous models de fallada en els nivells de porta lògica i de transferència de registres, que s'han utilitzat per analitzar els efectes de les fallades intermitents i la influència de diversos factors. Per mitigar aquests efectes, en aquesta tesi s'han estudiat diferents tècniques de tolerància a fallades, amb l'objectiu de determinar si són adequades per tolerar fallades intermitents, ja que les tècniques existents estan generalment dissenyades per tolerar fallades transitòries o permanents. Els resultats mostren que els mecanismes de detecció funcionen adequadament, però cal millorar els de recuperació. Una tècnica de tolerància a fallades existent són els codis correctors d'errors (ECC). Aquesta tesi proposa nous ECC dissenyats per tolerar fallades quan la seua taxa no és la mateixa en tots els bits d'una paraula, com en el cas de les fallades intermitents. Aquests, a més, poden presentar una taxa de fallada variable en el temps, pel que seria necessari un mecanisme de tolerància a fallades on el comportament s'adapte a l'evolució temporal de les condicions d'error, i que utilitze els nous ECC proposats.Saiz Adalid, LJ. (2015). Fallos intermitentes: análisis de causas y efectos, nuevos modelos de fallos y técnicas de mitigación [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/59452TESI

    Multilevel Runtime Verification for Safety and Security Critical Cyber Physical Systems from a Model Based Engineering Perspective

    Get PDF
    Advanced embedded system technology is one of the key driving forces behind the rapid growth of Cyber-Physical System (CPS) applications. CPS consists of multiple coordinating and cooperating components, which are often software-intensive and interact with each other to achieve unprecedented tasks. Such highly integrated CPSs have complex interaction failures, attack surfaces, and attack vectors that we have to protect and secure against. This dissertation advances the state-of-the-art by developing a multilevel runtime monitoring approach for safety and security critical CPSs where there are monitors at each level of processing and integration. Given that computation and data processing vulnerabilities may exist at multiple levels in an embedded CPS, it follows that solutions present at the levels where the faults or vulnerabilities originate are beneficial in timely detection of anomalies. Further, increasing functional and architectural complexity of critical CPSs have significant safety and security operational implications. These challenges are leading to a need for new methods where there is a continuum between design time assurance and runtime or operational assurance. Towards this end, this dissertation explores Model Based Engineering methods by which design assurance can be carried forward to the runtime domain, creating a shared responsibility for reducing the overall risk associated with the system at operation. Therefore, a synergistic combination of Verification & Validation at design time and runtime monitoring at multiple levels is beneficial in assuring safety and security of critical CPS. Furthermore, we realize our multilevel runtime monitor framework on hardware using a stream-based runtime verification language

    Innovative Techniques for Testing and Diagnosing SoCs

    Get PDF
    We rely upon the continued functioning of many electronic devices for our everyday welfare, usually embedding integrated circuits that are becoming even cheaper and smaller with improved features. Nowadays, microelectronics can integrate a working computer with CPU, memories, and even GPUs on a single die, namely System-On-Chip (SoC). SoCs are also employed on automotive safety-critical applications, but need to be tested thoroughly to comply with reliability standards, in particular the ISO26262 functional safety for road vehicles. The goal of this PhD. thesis is to improve SoC reliability by proposing innovative techniques for testing and diagnosing its internal modules: CPUs, memories, peripherals, and GPUs. The proposed approaches in the sequence appearing in this thesis are described as follows: 1. Embedded Memory Diagnosis: Memories are dense and complex circuits which are susceptible to design and manufacturing errors. Hence, it is important to understand the fault occurrence in the memory array. In practice, the logical and physical array representation differs due to an optimized design which adds enhancements to the device, namely scrambling. This part proposes an accurate memory diagnosis by showing the efforts of a software tool able to analyze test results, unscramble the memory array, map failing syndromes to cell locations, elaborate cumulative analysis, and elaborate a final fault model hypothesis. Several SRAM memory failing syndromes were analyzed as case studies gathered on an industrial automotive 32-bit SoC developed by STMicroelectronics. The tool displayed defects virtually, and results were confirmed by real photos taken from a microscope. 2. Functional Test Pattern Generation: The key for a successful test is the pattern applied to the device. They can be structural or functional; the former usually benefits from embedded test modules targeting manufacturing errors and is only effective before shipping the component to the client. The latter, on the other hand, can be applied during mission minimally impacting on performance but is penalized due to high generation time. However, functional test patterns may benefit for having different goals in functional mission mode. Part III of this PhD thesis proposes three different functional test pattern generation methods for CPU cores embedded in SoCs, targeting different test purposes, described as follows: a. Functional Stress Patterns: Are suitable for optimizing functional stress during I Operational-life Tests and Burn-in Screening for an optimal device reliability characterization b. Functional Power Hungry Patterns: Are suitable for determining functional peak power for strictly limiting the power of structural patterns during manufacturing tests, thus reducing premature device over-kill while delivering high test coverage c. Software-Based Self-Test Patterns: Combines the potentiality of structural patterns with functional ones, allowing its execution periodically during mission. In addition, an external hardware communicating with a devised SBST was proposed. It helps increasing in 3% the fault coverage by testing critical Hardly Functionally Testable Faults not covered by conventional SBST patterns. An automatic functional test pattern generation exploiting an evolutionary algorithm maximizing metrics related to stress, power, and fault coverage was employed in the above-mentioned approaches to quickly generate the desired patterns. The approaches were evaluated on two industrial cases developed by STMicroelectronics; 8051-based and a 32-bit Power Architecture SoCs. Results show that generation time was reduced upto 75% in comparison to older methodologies while increasing significantly the desired metrics. 3. Fault Injection in GPGPU: Fault injection mechanisms in semiconductor devices are suitable for generating structural patterns, testing and activating mitigation techniques, and validating robust hardware and software applications. GPGPUs are known for fast parallel computation used in high performance computing and advanced driver assistance where reliability is the key point. Moreover, GPGPU manufacturers do not provide design description code due to content secrecy. Therefore, commercial fault injectors using the GPGPU model is unfeasible, making radiation tests the only resource available, but are costly. In the last part of this thesis, we propose a software implemented fault injector able to inject bit-flip in memory elements of a real GPGPU. It exploits a software debugger tool and combines the C-CUDA grammar to wisely determine fault spots and apply bit-flip operations in program variables. The goal is to validate robust parallel algorithms by studying fault propagation or activating redundancy mechanisms they possibly embed. The effectiveness of the tool was evaluated on two robust applications: redundant parallel matrix multiplication and floating point Fast Fourier Transform
    corecore