1,984 research outputs found

    An Experimental Study of Reduced-Voltage Operation in Modern FPGAs for Neural Network Acceleration

    Get PDF
    We empirically evaluate an undervolting technique, i.e., underscaling the circuit supply voltage below the nominal level, to improve the power-efficiency of Convolutional Neural Network (CNN) accelerators mapped to Field Programmable Gate Arrays (FPGAs). Undervolting below a safe voltage level can lead to timing faults due to excessive circuit latency increase. We evaluate the reliability-power trade-off for such accelerators. Specifically, we experimentally study the reduced-voltage operation of multiple components of real FPGAs, characterize the corresponding reliability behavior of CNN accelerators, propose techniques to minimize the drawbacks of reduced-voltage operation, and combine undervolting with architectural CNN optimization techniques, i.e., quantization and pruning. We investigate the effect of environmental temperature on the reliability-power trade-off of such accelerators. We perform experiments on three identical samples of modern Xilinx ZCU102 FPGA platforms with five state-of-the-art image classification CNN benchmarks. This approach allows us to study the effects of our undervolting technique for both software and hardware variability. We achieve more than 3X power-efficiency (GOPs/W) gain via undervolting. 2.6X of this gain is the result of eliminating the voltage guardband region, i.e., the safe voltage region below the nominal level that is set by FPGA vendor to ensure correct functionality in worst-case environmental and circuit conditions. 43% of the power-efficiency gain is due to further undervolting below the guardband, which comes at the cost of accuracy loss in the CNN accelerator. We evaluate an effective frequency underscaling technique that prevents this accuracy loss, and find that it reduces the power-efficiency gain from 43% to 25%.Comment: To appear at the DSN 2020 conferenc

    Power Efficient Data-Aware SRAM Cell for SRAM-Based FPGA Architecture

    Get PDF
    The design of low-power SRAM cell becomes a necessity in today\u27s FPGAs, because SRAM is a critical component in FPGA design and consumes a large fraction of the total power. The present chapter provides an overview of various factors responsible for power consumption in FPGA and discusses the design techniques of low-power SRAM-based FPGA at system level, device level, and architecture levels. Finally, the chapter proposes a data-aware dynamic SRAM cell to control the power consumption in the cell. Stack effect has been adopted in the design to reduce the leakage current. The various peripheral circuits like address decoder circuit, write/read enable circuits, and sense amplifier have been modified to implement a power-efficient SRAM-based FPGA

    Degradation in FPGAs: Monitoring, Modeling and Mitigation

    Get PDF
    This dissertation targets the transistor aging degradation as well as the associated thermal challenges in FPGAs (since there is an exponential relation between aging and chip temperature). The main objectives are to perform experimentation, analysis and device-level model abstraction for modeling the degradation in FPGAs, then to monitor the FPGA to keep track of aging rates and ultimately to propose an aging-aware FPGA design flow to mitigate the aging

    High performance communication on reconfigurable clusters

    Get PDF
    High Performance Computing (HPC) has matured to where it is an essential third pillar, along with theory and experiment, in most domains of science and engineering. Communication latency is a key factor that is limiting the performance of HPC, but can be addressed by integrating communication into accelerators. This integration allows accelerators to communicate with each other without CPU interactions, and even bypassing the network stack. Field Programmable Gate Arrays (FPGAs) are the accelerators that currently best integrate communication with computation. The large number of Multi-gigabit Transceivers (MGTs) on most high-end FPGAs can provide high-bandwidth and low-latency inter-FPGA connections. Additionally, the reconfigurable FPGA fabric enables tight coupling between computation kernel and network interface. Our thesis is that an application-aware communication infrastructure for a multi-FPGA system makes substantial progress in solving the HPC communication bottleneck. This dissertation aims to provide an application-aware solution for communication infrastructure for FPGA-centric clusters. Specifically, our solution demonstrates application-awareness across multiple levels in the network stack, including low-level link protocols, router microarchitectures, routing algorithms, and applications. We start by investigating the low-level link protocol and the impact of its latency variance on performance. Our results demonstrate that, although some link jitter is always present, we can still assume near-synchronous communication on an FPGA-cluster. This provides the necessary condition for statically-scheduled routing. We then propose two novel router microarchitectures for two different kinds of workloads: a wormhole Virtual Channel (VC)-based router for workloads with dynamic communication, and a statically-scheduled Virtual Output Queueing (VOQ)-based router for workloads with static communication. For the first (VC-based) router, we propose a framework that generates application-aware router configurations. Our results show that, by adding application-awareness into router configuration, the network performance of FPGA clusters can be substantially improved. For the second (VOQ-based) router, we propose a novel offline collective routing algorithm. This shows a significant advantage over a state-of-the-art collective routing algorithm. We apply our communication infrastructure to a critical strong-scaling HPC kernel, the 3D FFT. The experimental results demonstrate that the performance of our design is faster than that on CPUs and GPUs by at least one order of magnitude (achieving strong scaling for the target applications). Surprisingly, the FPGA cluster performance is similar to that of an ASIC-cluster. We also implement the 3D FFT on another multi-FPGA platform: the Microsoft Catapult II cloud. Its performance is also comparable or superior to CPU and GPU HPC clusters. The second application we investigate is Molecular Dynamics Simulation (MD). We model MD on both FPGA clouds and clusters. We find that combining processing and general communication in the same device leads to extremely promising performance and the prospect of MD simulations well into the us/day range with a commodity cloud

    Analysis of performance variation in 16nm FinFET FPGA devices

    Get PDF

    Run-time power and performance scaling in 28 nm FPGAs

    Get PDF

    Timing Measurement Platform for Arbitrary Black-Box Circuits Based on Transition Probability

    No full text

    A Framework for the Detection of Crosstalk Noise in FPGAs

    Get PDF
    In recent years, crosstalk noise has emerged a serious problem because more and more devices and wires have been packed on electronic chips. As integrated circuits are migrated to more advanced technologies, it has become clear that crosstalk noise is the important phenomenon that must be taken into account. Despite of being more immune to crosstalk noise than their ASIC (application specific integrated circuit) counterparts, the dense interconnected structures of FPGAs (field programmable gate arrays) invite more vulnerabilities with crosstalk noise. Due to the lack of electrical detail concerning FPGA devices it is quite difficult to test the faults affected by crosstalk noise. This paper proposes a new approach for detecting the effects such as glitches and delays in transition that are due to crosstalk noise in FPGAs. This approach is similar to the BIST (built-in self test) technique in that it incorporates the test pattern generator to generate the test vectors and the analyzer to analyze the crosstalk faults without any overhead for testing
    corecore