3,795 research outputs found
HPIPE: Heterogeneous Layer-Pipelined and Sparse-Aware CNN Inference for FPGAs
We present both a novel Convolutional Neural Network (CNN) accelerator
architecture and a network compiler for FPGAs that outperforms all prior work.
Instead of having generic processing elements that together process one layer
at a time, our network compiler statically partitions available device
resources and builds custom-tailored hardware for each layer of a CNN. By
building hardware for each layer we can pack our controllers into fewer lookup
tables and use dedicated routing. These efficiencies enable our accelerator to
utilize 2x the DSPs and operate at more than 2x the frequency of prior work on
sparse CNN acceleration on FPGAs. We evaluate the performance of our
architecture on both sparse Resnet-50 and dense MobileNet Imagenet classifiers
on a Stratix 10 2800 FPGA. We find that the sparse Resnet-50 model has
throughput at a batch size of 1 of 4550 images/s, which is nearly 4x the
throughput of NVIDIA's fastest machine learning targeted GPU, the V100, and
outperforms all prior work on FPGAs.Comment: 8 Pages, 11 Figure
- …