9 research outputs found

    On the Reliability of Neural Networks Implemented on SRAM-based FPGAs for Low-cost Satellites

    Full text link
    Recent development in the neural network inference frameworks on Field-Programmable Gate Arrays (FPGAs) enables the rapid deployment of neural network applications on low-power FPGA devices. FPGAs are a promising platform for implementing neural network capabilities on board satellites thanks to the high energy efficiency of quantised neural networks on FPGAs. Furthermore, the reconfigurability of FPGAs allows neural network accelerators to share the FPGA with other onboard computer systems for reduced hardware complexity. However, the reliability against radiation-induced upsets of existing neural network inference frameworks on commercial FPGA devices was not previously studied. The reliability of neural network applications on FPGA is complicated by the perceptrons’ inherent algorithm-based fault tolerance, quantisation techniques, the varying sensitivity of non-neural layers like pooling layers, the architecture of the accelerator, and the software stack. This thesis explores the effect of single event upsets (SEUs) in potential spaceborne FPGA-based neural network applications using fully connected and convolutional networks, on applications using binary, 4-bit and 8-bit quantisation levels, and on applications created from both FINN and Vitis AI frameworks. We study the failure modes in neural network applications caused by SEUs, including loss of accuracy, reduction of throughput/timeout, and catastrophic system failure on FPGA SoC. We conducted fault injection experiments on fully connected and convolutional neural networks (CNNs) trained for classifying images from the MNIST handwritten digits dataset and the Airbus ship detection dataset. We found that SEUs have an insignificant impact on fully-connected binary networks trained on the MNIST dataset. However, the more complex CNN applications created from the FINN and Vitis-AI frameworks showed much higher sensitivity to SEUs and had more failure modes, including loss of accuracy, hardware hang-up, and even catastrophic failure in the OS of SoC devices due to erroneous driver behaviour. We found that the SEU cross-section of model-specific neural network accelerators like FINN can be reduced significantly by quantising the network to a lower precision. We also studied the efficacy of fault-tolerant design techniques, including full TMR and partial TMR, on the binary neural network and FINN accelerator

    Toward Fault-Tolerant Applications on Reconfigurable Systems-on-Chip

    Get PDF
    L'abstract è presente nell'allegato / the abstract is in the attachmen

    Development of low-overhead soft error mitigation technique for safety critical neural networks applications

    Get PDF
    Deep Neural Networks (DNNs) have been widely applied in healthcare applications. DNN-based healthcare applications are safety-critical systems that require highreliability implementation due to a high risk of human death or injury in case of malfunction. Several DNN accelerators are used to execute these DNN models, and GPUs are currently the most prominent and the dominated DNN accelerators. However, GPUs are prone to soft errors that dramatically impact the GPU behaviors; such error may corrupt data values or logic operations, which result in Silent Data Corruption (SDC). The SDC propagates from the physical level to the application level (SDC that occurs in hardware GPUs’ components) results in misclassification of objects in DNN models, leading to disastrous consequences. Food and Drug Administration (FDA) reported that 1078 of the adverse events (10.1%) were unintended errors (i.e., soft errors) encountered, including 52 injuries and two deaths. Several traditional techniques have been proposed to protect electronic devices from soft errors by replicating the DNN models. However, these techniques cause significant overheads of area, performance, and energy, making them challenging to implement in healthcare systems that have strict deadlines. To address this issue, this study developed a Selective Mitigation Technique based on the standard Triple Modular Redundancy (S-MTTM-R) to determine the model’s vulnerable parts, distinguishing Malfunction and Light-Malfunction errors. A comprehensive vulnerability analysis was performed using a SASSIFI fault injector at the CNN AlexNet and DenseNet201 models: layers, kernels, and instructions to show both models’ resilience and identify the most vulnerable portions and harden them by injecting them while implemented on NVIDIA’s GPUs. The experimental results showed that S-MTTM-R achieved a significant improvement in error masking. No-Malfunction have been improved from 54.90%, 67.85%, and 59.36% to 62.80%, 82.10%, and 80.76% in the three modes RF, IOA, and IOV, respectively for AlexNet. For DenseNet, NoMalfunction have been improved from 43.70%, 67.70%, and 54.68% to 59.90%, 84.75%, and 83.07% in the three modes RF, IOA, and IOV, respectively. Importantly, S-MTTMR decreased the percentage of errors that case misclassification (Malfunction) from 3.70% to 0.38% and 5.23% to 0.23%, for AlexNet and DenseNet, respectively. The performance analysis results showed that the S-MTTM-R achieved lower overhead compared to the well-known protection techniques: Algorithm-Based Fault Tolerance (ABFT), Double Modular Redundancy (DMR), and Triple Modular Redundancy (TMR). In light of these results, the study revealed strong evidence that the developed S-MTTMR was successfully mitigated the soft errors for the DNNs model on GPUs with lowoverheads in energy, performance, and area indicated a remarkable improvement in the healthcare domains’ model reliability

    Dependable Embedded Systems

    Get PDF
    This Open Access book introduces readers to many new techniques for enhancing and optimizing reliability in embedded systems, which have emerged particularly within the last five years. This book introduces the most prominent reliability concerns from today’s points of view and roughly recapitulates the progress in the community so far. Unlike other books that focus on a single abstraction level such circuit level or system level alone, the focus of this book is to deal with the different reliability challenges across different levels starting from the physical level all the way to the system level (cross-layer approaches). The book aims at demonstrating how new hardware/software co-design solution can be proposed to ef-fectively mitigate reliability degradation such as transistor aging, processor variation, temperature effects, soft errors, etc. Provides readers with latest insights into novel, cross-layer methods and models with respect to dependability of embedded systems; Describes cross-layer approaches that can leverage reliability through techniques that are pro-actively designed with respect to techniques at other layers; Explains run-time adaptation and concepts/means of self-organization, in order to achieve error resiliency in complex, future many core systems

    The 1991 3rd NASA Symposium on VLSI Design

    Get PDF
    Papers from the symposium are presented from the following sessions: (1) featured presentations 1; (2) very large scale integration (VLSI) circuit design; (3) VLSI architecture 1; (4) featured presentations 2; (5) neural networks; (6) VLSI architectures 2; (7) featured presentations 3; (8) verification 1; (9) analog design; (10) verification 2; (11) design innovations 1; (12) asynchronous design; and (13) design innovations 2

    The Fifth NASA Symposium on VLSI Design

    Get PDF
    The fifth annual NASA Symposium on VLSI Design had 13 sessions including Radiation Effects, Architectures, Mixed Signal, Design Techniques, Fault Testing, Synthesis, Signal Processing, and other Featured Presentations. The symposium provides insights into developments in VLSI and digital systems which can be used to increase data systems performance. The presentations share insights into next generation advances that will serve as a basis for future VLSI design

    Mu2e Technical Design Report

    Full text link
    The Mu2e experiment at Fermilab will search for charged lepton flavor violation via the coherent conversion process mu- N --> e- N with a sensitivity approximately four orders of magnitude better than the current world's best limits for this process. The experiment's sensitivity offers discovery potential over a wide array of new physics models and probes mass scales well beyond the reach of the LHC. We describe herein the preliminary design of the proposed Mu2e experiment. This document was created in partial fulfillment of the requirements necessary to obtain DOE CD-2 approval.Comment: compressed file, 888 pages, 621 figures, 126 tables; full resolution available at http://mu2e.fnal.gov; corrected typo in background summary, Table 3.
    corecore