2 research outputs found

    CoDeNet: Efficient Deployment of Input-Adaptive Object Detection on Embedded FPGAs

    Full text link
    Deploying deep learning models on embedded systems has been challenging due to limited computing resources. The majority of existing work focuses on accelerating image classification, while other fundamental vision problems, such as object detection, have not been adequately addressed. Compared with image classification, detection problems are more sensitive to the spatial variance of objects, and therefore, require specialized convolutions to aggregate spatial information. To address this need, recent work introduces dynamic deformable convolution to augment regular convolutions. However, this will lead to inefficient memory accesses of inputs with existing hardware. In this work, we harness the flexibility of FPGAs to develop a novel object detection pipeline with deformable convolutions. We show the speed-accuracy tradeoffs for a set of algorithm modifications including irregular-access versus limited-range and fixed-shape. We then Co-Design a Network CoDeNet with the modified deformable convolution and quantize it to 4-bit weights and 8-bit activations. With our high-efficiency implementation, our solution reaches 26.9 frames per second with a tiny model size of 0.76 MB while achieving 61.7 AP50 on the standard object detection dataset, Pascal VOC. With our higher accuracy implementation, our model gets to 67.1 AP50 on Pascal VOC with only 2.9 MB of parameters-20.9x smaller but 10% more accurate than Tiny-YOLO.Comment: Github repo: https://github.com/DequanWang/CoDeNet arXiv:2002.08357 is the preliminary version of this pape

    FPGA-based Neural Network Accelerator for Millimeter-Wave Radio-over-Fiber Systems

    Full text link
    With the rapidly-developing high-speed wireless communications, the 60 GHz millimeter-wave frequency range and radio-over-fiber systems have been investigated as a promising solution to deliver mm-wave signals. Neural networks have been studied to improve the mm-wave RoF system performances at the receiver side by suppressing linear and nonlinear impairments. However, previous neural network studies in mm-wave RoF systems focus on the off-line implementation with high-end GPUs , which is not practical for low power-consumption, low-cost and limited computation platform applications. To solve this issue, we investigate neural network hardware accelerator implementations using the field programmable gate array (FPGA), taking advantage of the low power consumption, parallel computation capability, and reconfigurablity features of FPGA. Convolutional neural network (CNN) and binary convolutional neural network (BCNN) hardware accelerators are demonstrated. In addition, to satisfy the low-latency requirement in mm-wave RoF systems and to enable the use of low-cost compact FPGA devices, a novel inner parallel optimization method is proposed. Compared with the embedded processor (ARM Cortex A9) execution latency, the CNN/BCNN FPGA-based hardware accelerator reduces their latency by over 92%. Compared with non-optimized FPGA implementations, the proposed optimization method reduces the processing latency by over 44% for CNN and BCNN. Compared with the GPU implementation, the latency of CNN implementation with the proposed optimization method is reduced by 85.49%, while the power consumption is reduced by 86.91%. Although the latency of BCNN implementation with the proposed optimization method is larger compared with the GPU implementation, the power consumption is reduced by 86.14%. The FPGA-based neural network hardware accelerators provide a promising solution for mm-wave RoF systems.Comment: 13 pages, 6 figure
    corecore