35 research outputs found

    Adaptive Voltage Scaling with In-Situ Detectors in Commercial FPGAs

    Get PDF

    Energy proportional computing with OpenCL on a FPGA-based overlay architecture

    Get PDF

    Extending the PCIe Interface with Parallel Compression/Decompression Hardware for Energy and Performance Optimization

    Get PDF
    PCIe is a high-performing interface used to move data from a central host PC to an accelerator such as Field Programmable Gate Arrays (FPGA). This interface allows a system to perform fast data transfers in High-Performance Computing (HPC) and provide a performance boost. However, HPC systems normally require large datasets, and in these situations PCIe can become a bottleneck. To address this issue, we propose an open-source hardware compression/decompression system that can be used to adapt with continuously-streamed data with low latency and high throughput. We implement a compressor and decompressor engines on FPGA, scale up with multiple engines working in parallel, and evaluate the energy reduction and performance with different numbers of multiple engines. To alleviate the performance bottleneck in the processor acting as a controller, we propose a hardware scheduler to fairly distribute the datasets among the engines. Our design reduces the transmission time in PCIe, and the results show an energy reduction of up to 48% in the PCIe transfers, thanks to the decrease in the number of bits that have to be transmitted. The overhead in terms of latency is maintained to a minimum and user selectable depending on the tolerances of the intended application

    Adaptive Voltage Scaling with In-Situ Detectors in Commercial FPGAs

    No full text

    Run-time power gating in hybrid ARM-FPGA devices

    No full text

    Entropy-Based Early-Exit in a FPGA-Based Low-Precision Neural Network

    No full text
    In this paper, we investigate the application of early-exit strategies to fully quantized neural networks, mapped to low-complexity FPGA SoC devices. The challenge of accuracy drop with low bitwidth quantized first convolutional layer and fully connected layers has been resolved. We apply an early-exit strategy to a network model that combines weights and activation with extremely low bitwidth and binary arithmetic precision based on the ImageNet dataset. We use entropy calculations to decide which branch of the early-exit network to take. The experiments show an improvement in inferred speed of 1.52×1.52\times 1.52×using an early-exit system, compared with using a single primary neural network, with a slight accuracy decrease of 1.64%
    corecore