2,232 research outputs found

    On the Resilience of RTL NN Accelerators: Fault Characterization and Mitigation

    Get PDF
    Machine Learning (ML) is making a strong resurgence in tune with the massive generation of unstructured data which in turn requires massive computational resources. Due to the inherently compute- and power-intensive structure of Neural Networks (NNs), hardware accelerators emerge as a promising solution. However, with technology node scaling below 10nm, hardware accelerators become more susceptible to faults, which in turn can impact the NN accuracy. In this paper, we study the resilience aspects of Register-Transfer Level (RTL) model of NN accelerators, in particular, fault characterization and mitigation. By following a High-Level Synthesis (HLS) approach, first, we characterize the vulnerability of various components of RTL NN. We observed that the severity of faults depends on both i) application-level specifications, i.e., NN data (inputs, weights, or intermediate), NN layers, and NN activation functions, and ii) architectural-level specifications, i.e., data representation model and the parallelism degree of the underlying accelerator. Second, motivated by characterization results, we present a low-overhead fault mitigation technique that can efficiently correct bit flips, by 47.3% better than state-of-the-art methods.Comment: 8 pages, 6 figure

    Evaluating Built-in ECC of FPGA on-chip Memories for the Mitigation of Undervolting Faults

    Get PDF
    Voltage underscaling below the nominal level is an effective solution for improving energy efficiency in digital circuits, e.g., Field Programmable Gate Arrays (FPGAs). However, further undervolting below a safe voltage level and without accompanying frequency scaling leads to timing related faults, potentially undermining the energy savings. Through experimental voltage underscaling studies on commercial FPGAs, we observed that the rate of these faults exponentially increases for on-chip memories, or Block RAMs (BRAMs). To mitigate these faults, we evaluated the efficiency of the built-in Error-Correction Code (ECC) and observed that more than 90% of the faults are correctable and further 7% are detectable (but not correctable). This efficiency is the result of the single-bit type of these faults, which are then effectively covered by the Single-Error Correction and Double-Error Detection (SECDED) design of the built-in ECC. Finally, motivated by the above experimental observations, we evaluated an FPGA-based Neural Network (NN) accelerator under low-voltage operations, while built-in ECC is leveraged to mitigate undervolting faults and thus, prevent NN significant accuracy loss. In consequence, we achieve 40% of the BRAM power saving through undervolting below the minimum safe voltage level, with a negligible NN accuracy loss, thanks to the substantial fault coverage by the built-in ECC.Comment: 6 pages, 2 figure

    Optimizing Scrubbing by Netlist Analysis for FPGA Configuration Bit Classification and Floorplanning

    Full text link
    Existing scrubbing techniques for SEU mitigation on FPGAs do not guarantee an error-free operation after SEU recovering if the affected configuration bits do belong to feedback loops of the implemented circuits. In this paper, we a) provide a netlist-based circuit analysis technique to distinguish so-called critical configuration bits from essential bits in order to identify configuration bits which will need also state-restoring actions after a recovered SEU and which not. Furthermore, b) an alternative classification approach using fault injection is developed in order to compare both classification techniques. Moreover, c) we will propose a floorplanning approach for reducing the effective number of scrubbed frames and d), experimental results will give evidence that our optimization methodology not only allows to detect errors earlier but also to minimize the Mean-Time-To-Repair (MTTR) of a circuit considerably. In particular, we show that by using our approach, the MTTR for datapath-intensive circuits can be reduced by up to 48.5% in comparison to standard approaches

    Tracing Fault Effects in FPGA Systems

    Get PDF
    The paper presents the extent of fault effects in FPGA based systems and concentrates on transient faults (induced by single event upsets – SEUs) within the configuration memory of FPGA. An original method of detailed analysis of fault effect propagation is presented. It is targeted at microprocessor based FPGA systems using the developed fault injection technique. The fault injection is performed at HDL description level of the microprocessor using special simulators and developed supplementary programs. The proposed methodology is illustrated for soft PicoBlaze microprocessor running 3 programs. The presented results reveal some problems with fault handling at the software level.

    Fault and Defect Tolerant Computer Architectures: Reliable Computing With Unreliable Devices

    Get PDF
    This research addresses design of a reliable computer from unreliable device technologies. A system architecture is developed for a fault and defect tolerant (FDT) computer. Trade-offs between different techniques are studied and yield and hardware cost models are developed. Fault and defect tolerant designs are created for the processor and the cache memory. Simulation results for the content-addressable memory (CAM)-based cache show 90% yield with device failure probabilities of 3 x 10(-6), three orders of magnitude better than non fault tolerant caches of the same size. The entire processor achieves 70% yield with device failure probabilities exceeding 10(-6). The required hardware redundancy is approximately 15 times that of a non-fault tolerant design. While larger than current FT designs, this architecture allows the use of devices much more likely to fail than silicon CMOS. As part of model development, an improved model is derived for NAND Multiplexing. The model is the first accurate model for small and medium amounts of redundancy. Previous models are extended to account for dependence between the inputs and produce more accurate results

    Survey of Soft Error Mitigation Techniques Applied to LEON3 Soft Processors on SRAM-Based FPGAs

    Get PDF
    Soft-core processors implemented in SRAM-based FPGAs are an attractive option for applications to be employed in radiation environments due to their flexibility, relatively-low application development costs, and reconfigurability features enabling them to adapt to the evolving mission needs. Despite the advantages soft-core processors possess, they are seldom used in critical applications because they are more sensitive to radiation than their hard-core counterparts. For instance, both the logic and signal routing circuitry of a soft-core processor as well as its user memory are susceptible to radiation-induced faults. Therefore, soft-core processors must be appropriately hardened against ionizing-radiation to become a feasible design choice for harsh environments and thus to reap all their benefits. This survey henceforth discusses various techniques to protect the configuration and user memories of an LEON3 soft processor, which is one of the most widely used soft-core processors in radiation environments, as reported in the state-of-the-art literature, with the objective of facilitating the choice of right fault-mitigation solution for any given soft-core processor

    Dependability-driven Strategies to Improve the Design and Verification of Safety-Critical HDL-based Embedded Systems

    Full text link
    [ES] La utilización de sistemas empotrados en cada vez más ámbitos de aplicación está llevando a que su diseño deba enfrentarse a mayores requisitos de rendimiento, consumo de energía y área (PPA). Asimismo, su utilización en aplicaciones críticas provoca que deban cumplir con estrictos requisitos de confiabilidad para garantizar su correcto funcionamiento durante períodos prolongados de tiempo. En particular, el uso de dispositivos lógicos programables de tipo FPGA es un gran desafío desde la perspectiva de la confiabilidad, ya que estos dispositivos son muy sensibles a la radiación. Por todo ello, la confiabilidad debe considerarse como uno de los criterios principales para la toma de decisiones a lo largo del todo flujo de diseño, que debe complementarse con diversos procesos que permitan alcanzar estrictos requisitos de confiabilidad. Primero, la evaluación de la robustez del diseño permite identificar sus puntos débiles, guiando así la definición de mecanismos de tolerancia a fallos. Segundo, la eficacia de los mecanismos definidos debe validarse experimentalmente. Tercero, la evaluación comparativa de la confiabilidad permite a los diseñadores seleccionar los componentes prediseñados (IP), las tecnologías de implementación y las herramientas de diseño (EDA) más adecuadas desde la perspectiva de la confiabilidad. Por último, la exploración del espacio de diseño (DSE) permite configurar de manera óptima los componentes y las herramientas seleccionados, mejorando así la confiabilidad y las métricas PPA de la implementación resultante. Todos los procesos anteriormente mencionados se basan en técnicas de inyección de fallos para evaluar la robustez del sistema diseñado. A pesar de que existe una amplia variedad de técnicas de inyección de fallos, varias problemas aún deben abordarse para cubrir las necesidades planteadas en el flujo de diseño. Aquellas soluciones basadas en simulación (SBFI) deben adaptarse a los modelos de nivel de implementación, teniendo en cuenta la arquitectura de los diversos componentes de la tecnología utilizada. Las técnicas de inyección de fallos basadas en FPGAs (FFI) deben abordar problemas relacionados con la granularidad del análisis para poder localizar los puntos débiles del diseño. Otro desafío es la reducción del coste temporal de los experimentos de inyección de fallos. Debido a la alta complejidad de los diseños actuales, el tiempo experimental dedicado a la evaluación de la confiabilidad puede ser excesivo incluso en aquellos escenarios más simples, mientras que puede ser inviable en aquellos procesos relacionados con la evaluación de múltiples configuraciones alternativas del diseño. Por último, estos procesos orientados a la confiabilidad carecen de un soporte instrumental que permita cubrir el flujo de diseño con toda su variedad de lenguajes de descripción de hardware, tecnologías de implementación y herramientas de diseño. Esta tesis aborda los retos anteriormente mencionados con el fin de integrar, de manera eficaz, estos procesos orientados a la confiabilidad en el flujo de diseño. Primeramente, se proponen nuevos métodos de inyección de fallos que permiten una evaluación de la confiabilidad, precisa y detallada, en diferentes niveles del flujo de diseño. Segundo, se definen nuevas técnicas para la aceleración de los experimentos de inyección que mejoran su coste temporal. Tercero, se define dos estrategias DSE que permiten configurar de manera óptima (desde la perspectiva de la confiabilidad) los componentes IP y las herramientas EDA, con un coste experimental mínimo. Cuarto, se propone un kit de herramientas que automatiza e incorpora con eficacia los procesos orientados a la confiabilidad en el flujo de diseño semicustom. Finalmente, se demuestra la utilidad y eficacia de las propuestas mediante un caso de estudio en el que se implementan tres procesadores empotrados en un FPGA de Xilinx serie 7.[CA] La utilització de sistemes encastats en cada vegada més àmbits d'aplicació està portant al fet que el seu disseny haja d'enfrontar-se a majors requisits de rendiment, consum d'energia i àrea (PPA). Així mateix, la seua utilització en aplicacions crítiques provoca que hagen de complir amb estrictes requisits de confiabilitat per a garantir el seu correcte funcionament durant períodes prolongats de temps. En particular, l'ús de dispositius lògics programables de tipus FPGA és un gran desafiament des de la perspectiva de la confiabilitat, ja que aquests dispositius són molt sensibles a la radiació. Per tot això, la confiabilitat ha de considerar-se com un dels criteris principals per a la presa de decisions al llarg del tot flux de disseny, que ha de complementar-se amb diversos processos que permeten aconseguir estrictes requisits de confiabilitat. Primer, l'avaluació de la robustesa del disseny permet identificar els seus punts febles, guiant així la definició de mecanismes de tolerància a fallades. Segon, l'eficàcia dels mecanismes definits ha de validar-se experimentalment. Tercer, l'avaluació comparativa de la confiabilitat permet als dissenyadors seleccionar els components predissenyats (IP), les tecnologies d'implementació i les eines de disseny (EDA) més adequades des de la perspectiva de la confiabilitat. Finalment, l'exploració de l'espai de disseny (DSE) permet configurar de manera òptima els components i les eines seleccionats, millorant així la confiabilitat i les mètriques PPA de la implementació resultant. Tots els processos anteriorment esmentats es basen en tècniques d'injecció de fallades per a poder avaluar la robustesa del sistema dissenyat. A pesar que existeix una àmplia varietat de tècniques d'injecció de fallades, diverses problemes encara han d'abordar-se per a cobrir les necessitats plantejades en el flux de disseny. Aquelles solucions basades en simulació (SBFI) han d'adaptar-se als models de nivell d'implementació, tenint en compte l'arquitectura dels diversos components de la tecnologia utilitzada. Les tècniques d'injecció de fallades basades en FPGAs (FFI) han d'abordar problemes relacionats amb la granularitat de l'anàlisi per a poder localitzar els punts febles del disseny. Un altre desafiament és la reducció del cost temporal dels experiments d'injecció de fallades. A causa de l'alta complexitat dels dissenys actuals, el temps experimental dedicat a l'avaluació de la confiabilitat pot ser excessiu fins i tot en aquells escenaris més simples, mentre que pot ser inviable en aquells processos relacionats amb l'avaluació de múltiples configuracions alternatives del disseny. Finalment, aquests processos orientats a la confiabilitat manquen d'un suport instrumental que permeta cobrir el flux de disseny amb tota la seua varietat de llenguatges de descripció de maquinari, tecnologies d'implementació i eines de disseny. Aquesta tesi aborda els reptes anteriorment esmentats amb la finalitat d'integrar, de manera eficaç, aquests processos orientats a la confiabilitat en el flux de disseny. Primerament, es proposen nous mètodes d'injecció de fallades que permeten una avaluació de la confiabilitat, precisa i detallada, en diferents nivells del flux de disseny. Segon, es defineixen noves tècniques per a l'acceleració dels experiments d'injecció que milloren el seu cost temporal. Tercer, es defineix dues estratègies DSE que permeten configurar de manera òptima (des de la perspectiva de la confiabilitat) els components IP i les eines EDA, amb un cost experimental mínim. Quart, es proposa un kit d'eines (DAVOS) que automatitza i incorpora amb eficàcia els processos orientats a la confiabilitat en el flux de disseny semicustom. Finalment, es demostra la utilitat i eficàcia de les propostes mitjançant un cas d'estudi en el qual s'implementen tres processadors encastats en un FPGA de Xilinx serie 7.[EN] Embedded systems are steadily extending their application areas, dealing with increasing requirements in performance, power consumption, and area (PPA). Whenever embedded systems are used in safety-critical applications, they must also meet rigorous dependability requirements to guarantee their correct operation during an extended period of time. Meeting these requirements is especially challenging for those systems that are based on Field Programmable Gate Arrays (FPGAs), since they are very susceptible to Single Event Upsets. This leads to increased dependability threats, especially in harsh environments. In such a way, dependability should be considered as one of the primary criteria for decision making throughout the whole design flow, which should be complemented by several dependability-driven processes. First, dependability assessment quantifies the robustness of hardware designs against faults and identifies their weak points. Second, dependability-driven verification ensures the correctness and efficiency of fault mitigation mechanisms. Third, dependability benchmarking allows designers to select (from a dependability perspective) the most suitable IP cores, implementation technologies, and electronic design automation (EDA) tools. Finally, dependability-aware design space exploration (DSE) allows to optimally configure the selected IP cores and EDA tools to improve as much as possible the dependability and PPA features of resulting implementations. The aforementioned processes rely on fault injection testing to quantify the robustness of the designed systems. Despite nowadays there exists a wide variety of fault injection solutions, several important problems still should be addressed to better cover the needs of a dependability-driven design flow. In particular, simulation-based fault injection (SBFI) should be adapted to implementation-level HDL models to take into account the architecture of diverse logic primitives, while keeping the injection procedures generic and low-intrusive. Likewise, the granularity of FPGA-based fault injection (FFI) should be refined to the enable accurate identification of weak points in FPGA-based designs. Another important challenge, that dependability-driven processes face in practice, is the reduction of SBFI and FFI experimental effort. The high complexity of modern designs raises the experimental effort beyond the available time budgets, even in simple dependability assessment scenarios, and it becomes prohibitive in presence of alternative design configurations. Finally, dependability-driven processes lack an instrumental support covering the semicustom design flow in all its variety of description languages, implementation technologies, and EDA tools. Existing fault injection tools only partially cover the individual stages of the design flow, being usually specific to a particular design representation level and implementation technology. This work addresses the aforementioned challenges by efficiently integrating dependability-driven processes into the design flow. First, it proposes new SBFI and FFI approaches that enable an accurate and detailed dependability assessment at different levels of the design flow. Second, it improves the performance of dependability-driven processes by defining new techniques for accelerating SBFI and FFI experiments. Third, it defines two DSE strategies that enable the optimal dependability-aware tuning of IP cores and EDA tools, while reducing as much as possible the robustness evaluation effort. Fourth, it proposes a new toolkit (DAVOS) that automates and seamlessly integrates the aforementioned dependability-driven processes into the semicustom design flow. Finally, it illustrates the usefulness and efficiency of these proposals through a case study consisting of three soft-core embedded processors implemented on a Xilinx 7-series SoC FPGA.Tuzov, I. (2020). Dependability-driven Strategies to Improve the Design and Verification of Safety-Critical HDL-based Embedded Systems [Tesis doctoral]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/159883TESI

    Fault-tolerant polyphase filters-based decimators for SRAM-based FPGA implementations

    Get PDF
    To reduce the oversampling rate of baseband signals, decimation is widely used in digital communication systems. Polyphase filters (PPFs) can be used to efficiently implement decimators. SRAM-based FPGAs provide large amounts of resources combined with flexibility and are a popular option for the implementation of communication receivers. However, they are sensitive to soft errors, which limit their application in harsh environments, such as space. An initial reliability study on SRAM-based FPGA implemented decimation shows that the soft errors on around 5% of the critical bits in the configuration memory of the decimator would degrade the decimated signal dramatically. Based on this result, this paper proposes an efficient fault tolerance scheme, in which the high correlation between adjacent PPFs outputs is utilized to tolerate the fault of a single-phase filter, and a duplicate and comparison structure is used to protect the fault tolerance logic. Hardware implementation and fault injection experiments show that the proposed scheme can drastically reduce the number of critical bits that cause severe output degradation with 1.5x resource usage and 0.75x maximum frequency relative to the unprotected decimator. Therefore, the proposed scheme can be an alternative to Triple Modular Redundancy that more than triples the use of resources.This work is supported by Natural Science Funds of China (Grant No. 62171313) and is partially supported by the ACHILLES project PID2019-104207RB-I00 funded by the Spanish Ministry of Science and Innovation and by the Madrid Community research project TAPIR-CM grant no. P2018/TCS-4496
    corecore