217 research outputs found

    An Experimental Study of Reduced-Voltage Operation in Modern FPGAs for Neural Network Acceleration

    Get PDF
    We empirically evaluate an undervolting technique, i.e., underscaling the circuit supply voltage below the nominal level, to improve the power-efficiency of Convolutional Neural Network (CNN) accelerators mapped to Field Programmable Gate Arrays (FPGAs). Undervolting below a safe voltage level can lead to timing faults due to excessive circuit latency increase. We evaluate the reliability-power trade-off for such accelerators. Specifically, we experimentally study the reduced-voltage operation of multiple components of real FPGAs, characterize the corresponding reliability behavior of CNN accelerators, propose techniques to minimize the drawbacks of reduced-voltage operation, and combine undervolting with architectural CNN optimization techniques, i.e., quantization and pruning. We investigate the effect of environmental temperature on the reliability-power trade-off of such accelerators. We perform experiments on three identical samples of modern Xilinx ZCU102 FPGA platforms with five state-of-the-art image classification CNN benchmarks. This approach allows us to study the effects of our undervolting technique for both software and hardware variability. We achieve more than 3X power-efficiency (GOPs/W) gain via undervolting. 2.6X of this gain is the result of eliminating the voltage guardband region, i.e., the safe voltage region below the nominal level that is set by FPGA vendor to ensure correct functionality in worst-case environmental and circuit conditions. 43% of the power-efficiency gain is due to further undervolting below the guardband, which comes at the cost of accuracy loss in the CNN accelerator. We evaluate an effective frequency underscaling technique that prevents this accuracy loss, and find that it reduces the power-efficiency gain from 43% to 25%.Comment: To appear at the DSN 2020 conferenc

    An experimental study of reduced-voltage operation in modern FPGAs for neural network acceleration

    Get PDF
    We empirically evaluate an undervolting technique, i.e., underscaling the circuit supply voltage below the nominal level, to improve the power-efficiency of Convolutional Neural Network (CNN) accelerators mapped to Field Programmable Gate Arrays (FPGAs). Undervolting below a safe voltage level can lead to timing faults due to excessive circuit latency increase. We evaluate the reliability-power trade-off for such accelerators. Specifically, we experimentally study the reduced-voltage operation of multiple components of real FPGAs, characterize the corresponding reliability behavior of CNN accelerators, propose techniques to minimize the drawbacks of reduced-voltage operation, and combine undervolting with architectural CNN optimization techniques, i.e., quantization and pruning. We investigate the effect ofenvironmental temperature on the reliability-power trade-off of such accelerators. We perform experiments on three identical samples of modern Xilinx ZCU102 FPGA platforms with five state-of-the-art image classification CNN benchmarks. This approach allows us to study the effects of our undervolting technique for both software and hardware variability. We achieve more than 3X power-efficiency (GOPs/W ) gain via undervolting. 2.6X of this gain is the result of eliminating the voltage guardband region, i.e., the safe voltage region below the nominal level that is set by FPGA vendor to ensure correct functionality in worst-case environmental and circuit conditions. 43% of the power-efficiency gain is due to further undervolting below the guardband, which comes at the cost of accuracy loss in the CNN accelerator. We evaluate an effective frequency underscaling technique that prevents this accuracy loss, and find that it reduces the power-efficiency gain from 43% to 25%.The work done for this paper was partially supported by a HiPEAC Collaboration Grant funded by the H2020 HiPEAC Project under grant agreement No. 779656. The research leading to these results has received funding from the European Union’s Horizon 2020 Programme under the LEGaTO Project (www.legato-project.eu), grant agreement No. 780681.Peer ReviewedPostprint (author's final draft

    Review of Fault Mitigation Approaches for Deep Neural Networks for Computer Vision in Autonomous Driving

    Get PDF
    The aim of this work is to identify and present challenges and risks related to the employment of DNNs in Computer Vision for Autonomous Driving. Nowadays one of the major technological challenges is to choose the right technology among the abundance that is available on the market. Specifically, in this thesis it is collected a synopsis of the state-of-the-art architectures, techniques and methodologies adopted for building fault-tolerant hardware and ensuring robustness in DNNs-based Computer Vision applications for Autonomous Driving

    On the Reliability of Neural Networks Implemented on SRAM-based FPGAs for Low-cost Satellites

    Full text link
    Recent development in the neural network inference frameworks on Field-Programmable Gate Arrays (FPGAs) enables the rapid deployment of neural network applications on low-power FPGA devices. FPGAs are a promising platform for implementing neural network capabilities on board satellites thanks to the high energy efficiency of quantised neural networks on FPGAs. Furthermore, the reconfigurability of FPGAs allows neural network accelerators to share the FPGA with other onboard computer systems for reduced hardware complexity. However, the reliability against radiation-induced upsets of existing neural network inference frameworks on commercial FPGA devices was not previously studied. The reliability of neural network applications on FPGA is complicated by the perceptrons’ inherent algorithm-based fault tolerance, quantisation techniques, the varying sensitivity of non-neural layers like pooling layers, the architecture of the accelerator, and the software stack. This thesis explores the effect of single event upsets (SEUs) in potential spaceborne FPGA-based neural network applications using fully connected and convolutional networks, on applications using binary, 4-bit and 8-bit quantisation levels, and on applications created from both FINN and Vitis AI frameworks. We study the failure modes in neural network applications caused by SEUs, including loss of accuracy, reduction of throughput/timeout, and catastrophic system failure on FPGA SoC. We conducted fault injection experiments on fully connected and convolutional neural networks (CNNs) trained for classifying images from the MNIST handwritten digits dataset and the Airbus ship detection dataset. We found that SEUs have an insignificant impact on fully-connected binary networks trained on the MNIST dataset. However, the more complex CNN applications created from the FINN and Vitis-AI frameworks showed much higher sensitivity to SEUs and had more failure modes, including loss of accuracy, hardware hang-up, and even catastrophic failure in the OS of SoC devices due to erroneous driver behaviour. We found that the SEU cross-section of model-specific neural network accelerators like FINN can be reduced significantly by quantising the network to a lower precision. We also studied the efficacy of fault-tolerant design techniques, including full TMR and partial TMR, on the binary neural network and FINN accelerator

    PyGFI: Analyzing and Enhancing Robustness of Graph Neural Networks Against Hardware Errors

    Full text link
    Graph neural networks (GNNs) have recently emerged as a promising learning paradigm in learning graph-structured data and have demonstrated wide success across various domains such as recommendation systems, social networks, and electronic design automation (EDA). Like other deep learning (DL) methods, GNNs are being deployed in sophisticated modern hardware systems, as well as dedicated accelerators. However, despite the popularity of GNNs and the recent efforts of bringing GNNs to hardware, the fault tolerance and resilience of GNNs have generally been overlooked. Inspired by the inherent algorithmic resilience of DL methods, this paper conducts, for the first time, a large-scale and empirical study of GNN resilience, aiming to understand the relationship between hardware faults and GNN accuracy. By developing a customized fault injection tool on top of PyTorch, we perform extensive fault injection experiments on various GNN models and application datasets. We observe that the error resilience of GNN models varies by orders of magnitude with respect to different models and application datasets. Further, we explore a low-cost error mitigation mechanism for GNN to enhance its resilience. This GNN resilience study aims to open up new directions and opportunities for future GNN accelerator design and architectural optimization
    • …
    corecore