1,198 research outputs found

    New Fault Detection, Mitigation and Injection Strategies for Current and Forthcoming Challenges of HW Embedded Designs

    Full text link
    Tesis por compendio[EN] Relevance of electronics towards safety of common devices has only been growing, as an ever growing stake of the functionality is assigned to them. But of course, this comes along the constant need for higher performances to fulfill such functionality requirements, while keeping power and budget low. In this scenario, industry is struggling to provide a technology which meets all the performance, power and price specifications, at the cost of an increased vulnerability to several types of known faults or the appearance of new ones. To provide a solution for the new and growing faults in the systems, designers have been using traditional techniques from safety-critical applications, which offer in general suboptimal results. In fact, modern embedded architectures offer the possibility of optimizing the dependability properties by enabling the interaction of hardware, firmware and software levels in the process. However, that point is not yet successfully achieved. Advances in every level towards that direction are much needed if flexible, robust, resilient and cost effective fault tolerance is desired. The work presented here focuses on the hardware level, with the background consideration of a potential integration into a holistic approach. The efforts in this thesis have focused several issues: (i) to introduce additional fault models as required for adequate representativity of physical effects blooming in modern manufacturing technologies, (ii) to provide tools and methods to efficiently inject both the proposed models and classical ones, (iii) to analyze the optimum method for assessing the robustness of the systems by using extensive fault injection and later correlation with higher level layers in an effort to cut development time and cost, (iv) to provide new detection methodologies to cope with challenges modeled by proposed fault models, (v) to propose mitigation strategies focused towards tackling such new threat scenarios and (vi) to devise an automated methodology for the deployment of many fault tolerance mechanisms in a systematic robust way. The outcomes of the thesis constitute a suite of tools and methods to help the designer of critical systems in his task to develop robust, validated, and on-time designs tailored to his application.[ES] La relevancia que la electrónica adquiere en la seguridad de los productos ha crecido inexorablemente, puesto que cada vez ésta copa una mayor influencia en la funcionalidad de los mismos. Pero, por supuesto, este hecho viene acompañado de una necesidad constante de mayores prestaciones para cumplir con los requerimientos funcionales, al tiempo que se mantienen los costes y el consumo en unos niveles reducidos. En este escenario, la industria está realizando esfuerzos para proveer una tecnología que cumpla con todas las especificaciones de potencia, consumo y precio, a costa de un incremento en la vulnerabilidad a múltiples tipos de fallos conocidos o la introducción de nuevos. Para ofrecer una solución a los fallos nuevos y crecientes en los sistemas, los diseñadores han recurrido a técnicas tradicionalmente asociadas a sistemas críticos para la seguridad, que ofrecen en general resultados sub-óptimos. De hecho, las arquitecturas empotradas modernas ofrecen la posibilidad de optimizar las propiedades de confiabilidad al habilitar la interacción de los niveles de hardware, firmware y software en el proceso. No obstante, ese punto no está resulto todavía. Se necesitan avances en todos los niveles en la mencionada dirección para poder alcanzar los objetivos de una tolerancia a fallos flexible, robusta, resiliente y a bajo coste. El trabajo presentado aquí se centra en el nivel de hardware, con la consideración de fondo de una potencial integración en una estrategia holística. Los esfuerzos de esta tesis se han centrado en los siguientes aspectos: (i) la introducción de modelos de fallo adicionales requeridos para la representación adecuada de efectos físicos surgentes en las tecnologías de manufactura actuales, (ii) la provisión de herramientas y métodos para la inyección eficiente de los modelos propuestos y de los clásicos, (iii) el análisis del método óptimo para estudiar la robustez de sistemas mediante el uso de inyección de fallos extensiva, y la posterior correlación con capas de más alto nivel en un esfuerzo por recortar el tiempo y coste de desarrollo, (iv) la provisión de nuevos métodos de detección para cubrir los retos planteados por los modelos de fallo propuestos, (v) la propuesta de estrategias de mitigación enfocadas hacia el tratamiento de dichos escenarios de amenaza y (vi) la introducción de una metodología automatizada de despliegue de diversos mecanismos de tolerancia a fallos de forma robusta y sistemática. Los resultados de la presente tesis constituyen un conjunto de herramientas y métodos para ayudar al diseñador de sistemas críticos en su tarea de desarrollo de diseños robustos, validados y en tiempo adaptados a su aplicación.[CA] La rellevància que l'electrònica adquireix en la seguretat dels productes ha crescut inexorablement, puix cada volta més aquesta abasta una major influència en la funcionalitat dels mateixos. Però, per descomptat, aquest fet ve acompanyat d'un constant necessitat de majors prestacions per acomplir els requeriments funcionals, mentre es mantenen els costos i consums en uns nivells reduïts. Donat aquest escenari, la indústria està fent esforços per proveir una tecnologia que complisca amb totes les especificacions de potència, consum i preu, tot a costa d'un increment en la vulnerabilitat a diversos tipus de fallades conegudes, i a la introducció de nous tipus. Per oferir una solució a les noves i creixents fallades als sistemes, els dissenyadors han recorregut a tècniques tradicionalment associades a sistemes crítics per a la seguretat, que en general oferixen resultats sub-òptims. De fet, les arquitectures empotrades modernes oferixen la possibilitat d'optimitzar les propietats de confiabilitat en habilitar la interacció dels nivells de hardware, firmware i software en el procés. Tot i això eixe punt no està resolt encara. Es necessiten avanços a tots els nivells en l'esmentada direcció per poder assolir els objectius d'una tolerància a fallades flexible, robusta, resilient i a baix cost. El treball ací presentat se centra en el nivell de hardware, amb la consideració de fons d'una potencial integració en una estratègia holística. Els esforços d'esta tesi s'han centrat en els següents aspectes: (i) la introducció de models de fallada addicionals requerits per a la representació adequada d'efectes físics que apareixen en les tecnologies de fabricació actuals, (ii) la provisió de ferramentes i mètodes per a la injecció eficient del models proposats i dels clàssics, (iii) l'anàlisi del mètode òptim per estudiar la robustesa de sistemes mitjançant l'ús d'injecció de fallades extensiva, i la posterior correlació amb capes de més alt nivell en un esforç per retallar el temps i cost de desenvolupament, (iv) la provisió de nous mètodes de detecció per cobrir els reptes plantejats pels models de fallades proposats, (v) la proposta d'estratègies de mitigació enfocades cap al tractament dels esmentats escenaris d'amenaça i (vi) la introducció d'una metodologia automatitzada de desplegament de diversos mecanismes de tolerància a fallades de forma robusta i sistemàtica. Els resultats de la present tesi constitueixen un conjunt de ferramentes i mètodes per ajudar el dissenyador de sistemes crítics en la seua tasca de desenvolupament de dissenys robustos, validats i a temps adaptats a la seua aplicació.Espinosa García, J. (2016). New Fault Detection, Mitigation and Injection Strategies for Current and Forthcoming Challenges of HW Embedded Designs [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/73146TESISCompendi

    Artificial Resilience in neuromorphic systems

    Get PDF
    Biological beings are intrinsically resilient. This means that they are able to continue to perform a task even if they are partially damaged or if some parts of them don’t work as expected. This is true also for the human brain. The research in these last years, however, has been concentrated on Artificial Intelligence (AI), to try to emulate the capabilities of the brain to improve itself, learn- ing from experience. Artificial Resilience (AR) is something not explored in detail yet. This four pages abstract present a Ph.D. path dedicated to the extensive study of Artificial Resilience in all its aspects. The study will target neuromorphic systems, in particu- lar Spiking Neural Networks, an emerging type of neural network models that try to mimic the behavior of a biological brain in a faithful way. In addition to this they are in general more suitable for an hardware acceleration. The goal of the Ph.D. is to realize a com- plete neuromorphic accelerator, configurable and resilient, and to apply it to improve the resilience of other electronic systems. Such an accelerator will be able to target area- and power-constrained applications in mission-critical environments, providing a more efficient alternative to classical techniques like Error Correction Codes (ECC) or redundancy to improve the robustness of a complex electronic system

    GRU-based denoising autoencoder for detection and clustering of unknown single and concurrent faults during system integration testing of automotive software systems

    Get PDF
    Recently, remarkable successes have been achieved in the quality assurance of automotive software systems (ASSs) through the utilization of real-time hardware-in-the-loop (HIL) simulation. Based on the HIL platform, safe, flexible and reliable realistic simulation during the system development process can be enabled. However, notwithstanding the test automation capability, large amounts of recordings data are generated as a result of HIL test executions. Expert knowledge-based approaches to analyze the generated recordings, with the aim of detecting and identifying the faults, are costly in terms of time, effort and difficulty. Therefore, in this study, a novel deep learning-based methodology is proposed so that the faults of automotive sensor signals can be efficiently and automatically detected and identified without human intervention. Concretely, a hybrid GRU-based denoising autoencoder (GRU-based DAE) model with the k-means algorithm is developed for the fault-detection and clustering problem in sequential data. By doing so, based on the real-time historical data, not only individual faults but also unknown simultaneous faults under noisy conditions can be accurately detected and clustered. The applicability and advantages of the proposed method for the HIL testing process are demonstrated by two automotive case studies. To be specific, a high-fidelity gasoline engine and vehicle dynamic system along with an entire vehicle model are considered to verify the performance of the proposed model. The superiority of the proposed architecture compared to other autoencoder variants is presented in the results in terms of reconstruction error under several noise levels. The validation results indicate that the proposed model can perform high detection and clustering accuracy of unknown faults compared to stand-alone techniques

    Innovative Techniques for Testing and Diagnosing SoCs

    Get PDF
    We rely upon the continued functioning of many electronic devices for our everyday welfare, usually embedding integrated circuits that are becoming even cheaper and smaller with improved features. Nowadays, microelectronics can integrate a working computer with CPU, memories, and even GPUs on a single die, namely System-On-Chip (SoC). SoCs are also employed on automotive safety-critical applications, but need to be tested thoroughly to comply with reliability standards, in particular the ISO26262 functional safety for road vehicles. The goal of this PhD. thesis is to improve SoC reliability by proposing innovative techniques for testing and diagnosing its internal modules: CPUs, memories, peripherals, and GPUs. The proposed approaches in the sequence appearing in this thesis are described as follows: 1. Embedded Memory Diagnosis: Memories are dense and complex circuits which are susceptible to design and manufacturing errors. Hence, it is important to understand the fault occurrence in the memory array. In practice, the logical and physical array representation differs due to an optimized design which adds enhancements to the device, namely scrambling. This part proposes an accurate memory diagnosis by showing the efforts of a software tool able to analyze test results, unscramble the memory array, map failing syndromes to cell locations, elaborate cumulative analysis, and elaborate a final fault model hypothesis. Several SRAM memory failing syndromes were analyzed as case studies gathered on an industrial automotive 32-bit SoC developed by STMicroelectronics. The tool displayed defects virtually, and results were confirmed by real photos taken from a microscope. 2. Functional Test Pattern Generation: The key for a successful test is the pattern applied to the device. They can be structural or functional; the former usually benefits from embedded test modules targeting manufacturing errors and is only effective before shipping the component to the client. The latter, on the other hand, can be applied during mission minimally impacting on performance but is penalized due to high generation time. However, functional test patterns may benefit for having different goals in functional mission mode. Part III of this PhD thesis proposes three different functional test pattern generation methods for CPU cores embedded in SoCs, targeting different test purposes, described as follows: a. Functional Stress Patterns: Are suitable for optimizing functional stress during I Operational-life Tests and Burn-in Screening for an optimal device reliability characterization b. Functional Power Hungry Patterns: Are suitable for determining functional peak power for strictly limiting the power of structural patterns during manufacturing tests, thus reducing premature device over-kill while delivering high test coverage c. Software-Based Self-Test Patterns: Combines the potentiality of structural patterns with functional ones, allowing its execution periodically during mission. In addition, an external hardware communicating with a devised SBST was proposed. It helps increasing in 3% the fault coverage by testing critical Hardly Functionally Testable Faults not covered by conventional SBST patterns. An automatic functional test pattern generation exploiting an evolutionary algorithm maximizing metrics related to stress, power, and fault coverage was employed in the above-mentioned approaches to quickly generate the desired patterns. The approaches were evaluated on two industrial cases developed by STMicroelectronics; 8051-based and a 32-bit Power Architecture SoCs. Results show that generation time was reduced upto 75% in comparison to older methodologies while increasing significantly the desired metrics. 3. Fault Injection in GPGPU: Fault injection mechanisms in semiconductor devices are suitable for generating structural patterns, testing and activating mitigation techniques, and validating robust hardware and software applications. GPGPUs are known for fast parallel computation used in high performance computing and advanced driver assistance where reliability is the key point. Moreover, GPGPU manufacturers do not provide design description code due to content secrecy. Therefore, commercial fault injectors using the GPGPU model is unfeasible, making radiation tests the only resource available, but are costly. In the last part of this thesis, we propose a software implemented fault injector able to inject bit-flip in memory elements of a real GPGPU. It exploits a software debugger tool and combines the C-CUDA grammar to wisely determine fault spots and apply bit-flip operations in program variables. The goal is to validate robust parallel algorithms by studying fault propagation or activating redundancy mechanisms they possibly embed. The effectiveness of the tool was evaluated on two robust applications: redundant parallel matrix multiplication and floating point Fast Fourier Transform

    Cross layer reliability estimation for digital systems

    Get PDF
    Forthcoming manufacturing technologies hold the promise to increase multifuctional computing systems performance and functionality thanks to a remarkable growth of the device integration density. Despite the benefits introduced by this technology improvements, reliability is becoming a key challenge for the semiconductor industry. With transistor size reaching the atomic dimensions, vulnerability to unavoidable fluctuations in the manufacturing process and environmental stress rise dramatically. Failing to meet a reliability requirement may add excessive re-design cost to recover and may have severe consequences on the success of a product. %Worst-case design with large margins to guarantee reliable operation has been employed for long time. However, it is reaching a limit that makes it economically unsustainable due to its performance, area, and power cost. One of the open challenges for future technologies is building ``dependable'' systems on top of unreliable components, which will degrade and even fail during normal lifetime of the chip. Conventional design techniques are highly inefficient. They expend significant amount of energy to tolerate the device unpredictability by adding safety margins to a circuit's operating voltage, clock frequency or charge stored per bit. Unfortunately, the additional cost introduced to compensate unreliability are rapidly becoming unacceptable in today's environment where power consumption is often the limiting factor for integrated circuit performance, and energy efficiency is a top concern. Attention should be payed to tailor techniques to improve the reliability of a system on the basis of its requirements, ending up with cost-effective solutions favoring the success of the product on the market. Cross-layer reliability is one of the most promising approaches to achieve this goal. Cross-layer reliability techniques take into account the interactions between the layers composing a complex system (i.e., technology, hardware and software layers) to implement efficient cross-layer fault mitigation mechanisms. Fault tolerance mechanism are carefully implemented at different layers starting from the technology up to the software layer to carefully optimize the system by exploiting the inner capability of each layer to mask lower level faults. For this purpose, cross-layer reliability design techniques need to be complemented with cross-layer reliability evaluation tools, able to precisely assess the reliability level of a selected design early in the design cycle. Accurate and early reliability estimates would enable the exploration of the system design space and the optimization of multiple constraints such as performance, power consumption, cost and reliability. This Ph.D. thesis is devoted to the development of new methodologies and tools to evaluate and optimize the reliability of complex digital systems during the early design stages. More specifically, techniques addressing hardware accelerators (i.e., FPGAs and GPUs), microprocessors and full systems are discussed. All developed methodologies are presented in conjunction with their application to real-world use cases belonging to different computational domains

    No Fault Found events in maintenance engineering Part 2: Root causes, technical developments and future research

    Get PDF
    This is the second half of a two paper series covering aspects of the no fault found (NFF) phenomenon, which is highly challenging and is becoming even more important due to increasing complexity and criticality of technical systems. Part 1 introduced the fundamental concept of unknown failures from an organizational, behavioral and cultural stand point. It also reported an industrial outlook to the problem, recent procedural standards, whilst discussing the financial implications and safety concerns. In this issue, the authors examine the technical aspects, reviewing the common causes of NFF failures in electronic, software and mechanical systems. This is followed by a survey on technological techniques actively being used to reduce the consequence of such instances. After discussing improvements in testability, the article identifies gaps in literature and points out the core areas that should be focused in the future. Special attention is paid to the recent trends on knowledge sharing and troubleshooting tools; with potential research on technical diagnosis being enumerated

    Reliability-aware and energy-efficient system level design for networks-on-chip

    Get PDF
    2015 Spring.Includes bibliographical references.With CMOS technology aggressively scaling into the ultra-deep sub-micron (UDSM) regime and application complexity growing rapidly in recent years, processors today are being driven to integrate multiple cores on a chip. Such chip multiprocessor (CMP) architectures offer unprecedented levels of computing performance for highly parallel emerging applications in the era of digital convergence. However, a major challenge facing the designers of these emerging multicore architectures is the increased likelihood of failure due to the rise in transient, permanent, and intermittent faults caused by a variety of factors that are becoming more and more prevalent with technology scaling. On-chip interconnect architectures are particularly susceptible to faults that can corrupt transmitted data or prevent it from reaching its destination. Reliability concerns in UDSM nodes have in part contributed to the shift from traditional bus-based communication fabrics to network-on-chip (NoC) architectures that provide better scalability, performance, and utilization than buses. In this thesis, to overcome potential faults in NoCs, my research began by exploring fault-tolerant routing algorithms. Under the constraint of deadlock freedom, we make use of the inherent redundancy in NoCs due to multiple paths between packet sources and sinks and propose different fault-tolerant routing schemes to achieve much better fault tolerance capabilities than possible with traditional routing schemes. The proposed schemes also use replication opportunistically to optimize the balance between energy overhead and arrival rate. As 3D integrated circuit (3D-IC) technology with wafer-to-wafer bonding has been recently proposed as a promising candidate for future CMPs, we also propose a fault-tolerant routing scheme for 3D NoCs which outperforms the existing popular routing schemes in terms of energy consumption, performance and reliability. To quantify reliability and provide different levels of intelligent protection, for the first time, we propose the network vulnerability factor (NVF) metric to characterize the vulnerability of NoC components to faults. NVF determines the probabilities that faults in NoC components manifest as errors in the final program output of the CMP system. With NVF aware partial protection for NoC components, almost 50% energy cost can be saved compared to the traditional approach of comprehensively protecting all NoC components. Lastly, we focus on the problem of fault-tolerant NoC design, that involves many NP-hard sub-problems such as core mapping, fault-tolerant routing, and fault-tolerant router configuration. We propose a novel design-time (RESYN) and a hybrid design and runtime (HEFT) synthesis framework to trade-off energy consumption and reliability in the NoC fabric at the system level for CMPs. Together, our research in fault-tolerant NoC routing, reliability modeling, and reliability aware NoC synthesis substantially enhances NoC reliability and energy-efficiency beyond what is possible with traditional approaches and state-of-the-art strategies from prior work

    Dependability-driven Strategies to Improve the Design and Verification of Safety-Critical HDL-based Embedded Systems

    Full text link
    [ES] La utilización de sistemas empotrados en cada vez más ámbitos de aplicación está llevando a que su diseño deba enfrentarse a mayores requisitos de rendimiento, consumo de energía y área (PPA). Asimismo, su utilización en aplicaciones críticas provoca que deban cumplir con estrictos requisitos de confiabilidad para garantizar su correcto funcionamiento durante períodos prolongados de tiempo. En particular, el uso de dispositivos lógicos programables de tipo FPGA es un gran desafío desde la perspectiva de la confiabilidad, ya que estos dispositivos son muy sensibles a la radiación. Por todo ello, la confiabilidad debe considerarse como uno de los criterios principales para la toma de decisiones a lo largo del todo flujo de diseño, que debe complementarse con diversos procesos que permitan alcanzar estrictos requisitos de confiabilidad. Primero, la evaluación de la robustez del diseño permite identificar sus puntos débiles, guiando así la definición de mecanismos de tolerancia a fallos. Segundo, la eficacia de los mecanismos definidos debe validarse experimentalmente. Tercero, la evaluación comparativa de la confiabilidad permite a los diseñadores seleccionar los componentes prediseñados (IP), las tecnologías de implementación y las herramientas de diseño (EDA) más adecuadas desde la perspectiva de la confiabilidad. Por último, la exploración del espacio de diseño (DSE) permite configurar de manera óptima los componentes y las herramientas seleccionados, mejorando así la confiabilidad y las métricas PPA de la implementación resultante. Todos los procesos anteriormente mencionados se basan en técnicas de inyección de fallos para evaluar la robustez del sistema diseñado. A pesar de que existe una amplia variedad de técnicas de inyección de fallos, varias problemas aún deben abordarse para cubrir las necesidades planteadas en el flujo de diseño. Aquellas soluciones basadas en simulación (SBFI) deben adaptarse a los modelos de nivel de implementación, teniendo en cuenta la arquitectura de los diversos componentes de la tecnología utilizada. Las técnicas de inyección de fallos basadas en FPGAs (FFI) deben abordar problemas relacionados con la granularidad del análisis para poder localizar los puntos débiles del diseño. Otro desafío es la reducción del coste temporal de los experimentos de inyección de fallos. Debido a la alta complejidad de los diseños actuales, el tiempo experimental dedicado a la evaluación de la confiabilidad puede ser excesivo incluso en aquellos escenarios más simples, mientras que puede ser inviable en aquellos procesos relacionados con la evaluación de múltiples configuraciones alternativas del diseño. Por último, estos procesos orientados a la confiabilidad carecen de un soporte instrumental que permita cubrir el flujo de diseño con toda su variedad de lenguajes de descripción de hardware, tecnologías de implementación y herramientas de diseño. Esta tesis aborda los retos anteriormente mencionados con el fin de integrar, de manera eficaz, estos procesos orientados a la confiabilidad en el flujo de diseño. Primeramente, se proponen nuevos métodos de inyección de fallos que permiten una evaluación de la confiabilidad, precisa y detallada, en diferentes niveles del flujo de diseño. Segundo, se definen nuevas técnicas para la aceleración de los experimentos de inyección que mejoran su coste temporal. Tercero, se define dos estrategias DSE que permiten configurar de manera óptima (desde la perspectiva de la confiabilidad) los componentes IP y las herramientas EDA, con un coste experimental mínimo. Cuarto, se propone un kit de herramientas que automatiza e incorpora con eficacia los procesos orientados a la confiabilidad en el flujo de diseño semicustom. Finalmente, se demuestra la utilidad y eficacia de las propuestas mediante un caso de estudio en el que se implementan tres procesadores empotrados en un FPGA de Xilinx serie 7.[CA] La utilització de sistemes encastats en cada vegada més àmbits d'aplicació està portant al fet que el seu disseny haja d'enfrontar-se a majors requisits de rendiment, consum d'energia i àrea (PPA). Així mateix, la seua utilització en aplicacions crítiques provoca que hagen de complir amb estrictes requisits de confiabilitat per a garantir el seu correcte funcionament durant períodes prolongats de temps. En particular, l'ús de dispositius lògics programables de tipus FPGA és un gran desafiament des de la perspectiva de la confiabilitat, ja que aquests dispositius són molt sensibles a la radiació. Per tot això, la confiabilitat ha de considerar-se com un dels criteris principals per a la presa de decisions al llarg del tot flux de disseny, que ha de complementar-se amb diversos processos que permeten aconseguir estrictes requisits de confiabilitat. Primer, l'avaluació de la robustesa del disseny permet identificar els seus punts febles, guiant així la definició de mecanismes de tolerància a fallades. Segon, l'eficàcia dels mecanismes definits ha de validar-se experimentalment. Tercer, l'avaluació comparativa de la confiabilitat permet als dissenyadors seleccionar els components predissenyats (IP), les tecnologies d'implementació i les eines de disseny (EDA) més adequades des de la perspectiva de la confiabilitat. Finalment, l'exploració de l'espai de disseny (DSE) permet configurar de manera òptima els components i les eines seleccionats, millorant així la confiabilitat i les mètriques PPA de la implementació resultant. Tots els processos anteriorment esmentats es basen en tècniques d'injecció de fallades per a poder avaluar la robustesa del sistema dissenyat. A pesar que existeix una àmplia varietat de tècniques d'injecció de fallades, diverses problemes encara han d'abordar-se per a cobrir les necessitats plantejades en el flux de disseny. Aquelles solucions basades en simulació (SBFI) han d'adaptar-se als models de nivell d'implementació, tenint en compte l'arquitectura dels diversos components de la tecnologia utilitzada. Les tècniques d'injecció de fallades basades en FPGAs (FFI) han d'abordar problemes relacionats amb la granularitat de l'anàlisi per a poder localitzar els punts febles del disseny. Un altre desafiament és la reducció del cost temporal dels experiments d'injecció de fallades. A causa de l'alta complexitat dels dissenys actuals, el temps experimental dedicat a l'avaluació de la confiabilitat pot ser excessiu fins i tot en aquells escenaris més simples, mentre que pot ser inviable en aquells processos relacionats amb l'avaluació de múltiples configuracions alternatives del disseny. Finalment, aquests processos orientats a la confiabilitat manquen d'un suport instrumental que permeta cobrir el flux de disseny amb tota la seua varietat de llenguatges de descripció de maquinari, tecnologies d'implementació i eines de disseny. Aquesta tesi aborda els reptes anteriorment esmentats amb la finalitat d'integrar, de manera eficaç, aquests processos orientats a la confiabilitat en el flux de disseny. Primerament, es proposen nous mètodes d'injecció de fallades que permeten una avaluació de la confiabilitat, precisa i detallada, en diferents nivells del flux de disseny. Segon, es defineixen noves tècniques per a l'acceleració dels experiments d'injecció que milloren el seu cost temporal. Tercer, es defineix dues estratègies DSE que permeten configurar de manera òptima (des de la perspectiva de la confiabilitat) els components IP i les eines EDA, amb un cost experimental mínim. Quart, es proposa un kit d'eines (DAVOS) que automatitza i incorpora amb eficàcia els processos orientats a la confiabilitat en el flux de disseny semicustom. Finalment, es demostra la utilitat i eficàcia de les propostes mitjançant un cas d'estudi en el qual s'implementen tres processadors encastats en un FPGA de Xilinx serie 7.[EN] Embedded systems are steadily extending their application areas, dealing with increasing requirements in performance, power consumption, and area (PPA). Whenever embedded systems are used in safety-critical applications, they must also meet rigorous dependability requirements to guarantee their correct operation during an extended period of time. Meeting these requirements is especially challenging for those systems that are based on Field Programmable Gate Arrays (FPGAs), since they are very susceptible to Single Event Upsets. This leads to increased dependability threats, especially in harsh environments. In such a way, dependability should be considered as one of the primary criteria for decision making throughout the whole design flow, which should be complemented by several dependability-driven processes. First, dependability assessment quantifies the robustness of hardware designs against faults and identifies their weak points. Second, dependability-driven verification ensures the correctness and efficiency of fault mitigation mechanisms. Third, dependability benchmarking allows designers to select (from a dependability perspective) the most suitable IP cores, implementation technologies, and electronic design automation (EDA) tools. Finally, dependability-aware design space exploration (DSE) allows to optimally configure the selected IP cores and EDA tools to improve as much as possible the dependability and PPA features of resulting implementations. The aforementioned processes rely on fault injection testing to quantify the robustness of the designed systems. Despite nowadays there exists a wide variety of fault injection solutions, several important problems still should be addressed to better cover the needs of a dependability-driven design flow. In particular, simulation-based fault injection (SBFI) should be adapted to implementation-level HDL models to take into account the architecture of diverse logic primitives, while keeping the injection procedures generic and low-intrusive. Likewise, the granularity of FPGA-based fault injection (FFI) should be refined to the enable accurate identification of weak points in FPGA-based designs. Another important challenge, that dependability-driven processes face in practice, is the reduction of SBFI and FFI experimental effort. The high complexity of modern designs raises the experimental effort beyond the available time budgets, even in simple dependability assessment scenarios, and it becomes prohibitive in presence of alternative design configurations. Finally, dependability-driven processes lack an instrumental support covering the semicustom design flow in all its variety of description languages, implementation technologies, and EDA tools. Existing fault injection tools only partially cover the individual stages of the design flow, being usually specific to a particular design representation level and implementation technology. This work addresses the aforementioned challenges by efficiently integrating dependability-driven processes into the design flow. First, it proposes new SBFI and FFI approaches that enable an accurate and detailed dependability assessment at different levels of the design flow. Second, it improves the performance of dependability-driven processes by defining new techniques for accelerating SBFI and FFI experiments. Third, it defines two DSE strategies that enable the optimal dependability-aware tuning of IP cores and EDA tools, while reducing as much as possible the robustness evaluation effort. Fourth, it proposes a new toolkit (DAVOS) that automates and seamlessly integrates the aforementioned dependability-driven processes into the semicustom design flow. Finally, it illustrates the usefulness and efficiency of these proposals through a case study consisting of three soft-core embedded processors implemented on a Xilinx 7-series SoC FPGA.Tuzov, I. (2020). Dependability-driven Strategies to Improve the Design and Verification of Safety-Critical HDL-based Embedded Systems [Tesis doctoral]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/159883TESI
    • …
    corecore