2 research outputs found
CoDeNet: Efficient Deployment of Input-Adaptive Object Detection on Embedded FPGAs
Deploying deep learning models on embedded systems has been challenging due
to limited computing resources. The majority of existing work focuses on
accelerating image classification, while other fundamental vision problems,
such as object detection, have not been adequately addressed. Compared with
image classification, detection problems are more sensitive to the spatial
variance of objects, and therefore, require specialized convolutions to
aggregate spatial information. To address this need, recent work introduces
dynamic deformable convolution to augment regular convolutions. However, this
will lead to inefficient memory accesses of inputs with existing hardware. In
this work, we harness the flexibility of FPGAs to develop a novel object
detection pipeline with deformable convolutions. We show the speed-accuracy
tradeoffs for a set of algorithm modifications including irregular-access
versus limited-range and fixed-shape. We then Co-Design a Network CoDeNet with
the modified deformable convolution and quantize it to 4-bit weights and 8-bit
activations. With our high-efficiency implementation, our solution reaches 26.9
frames per second with a tiny model size of 0.76 MB while achieving 61.7 AP50
on the standard object detection dataset, Pascal VOC. With our higher accuracy
implementation, our model gets to 67.1 AP50 on Pascal VOC with only 2.9 MB of
parameters-20.9x smaller but 10% more accurate than Tiny-YOLO.Comment: Github repo: https://github.com/DequanWang/CoDeNet arXiv:2002.08357
is the preliminary version of this pape
FPGA-based Neural Network Accelerator for Millimeter-Wave Radio-over-Fiber Systems
With the rapidly-developing high-speed wireless communications, the 60 GHz
millimeter-wave frequency range and radio-over-fiber systems have been
investigated as a promising solution to deliver mm-wave signals. Neural
networks have been studied to improve the mm-wave RoF system performances at
the receiver side by suppressing linear and nonlinear impairments. However,
previous neural network studies in mm-wave RoF systems focus on the off-line
implementation with high-end GPUs , which is not practical for low
power-consumption, low-cost and limited computation platform applications. To
solve this issue, we investigate neural network hardware accelerator
implementations using the field programmable gate array (FPGA), taking
advantage of the low power consumption, parallel computation capability, and
reconfigurablity features of FPGA. Convolutional neural network (CNN) and
binary convolutional neural network (BCNN) hardware accelerators are
demonstrated. In addition, to satisfy the low-latency requirement in mm-wave
RoF systems and to enable the use of low-cost compact FPGA devices, a novel
inner parallel optimization method is proposed. Compared with the embedded
processor (ARM Cortex A9) execution latency, the CNN/BCNN FPGA-based hardware
accelerator reduces their latency by over 92%. Compared with non-optimized FPGA
implementations, the proposed optimization method reduces the processing
latency by over 44% for CNN and BCNN. Compared with the GPU implementation, the
latency of CNN implementation with the proposed optimization method is reduced
by 85.49%, while the power consumption is reduced by 86.91%. Although the
latency of BCNN implementation with the proposed optimization method is larger
compared with the GPU implementation, the power consumption is reduced by
86.14%. The FPGA-based neural network hardware accelerators provide a promising
solution for mm-wave RoF systems.Comment: 13 pages, 6 figure