1 research outputs found
FPGA Implementations of 3D-SIMD Processor Architecture for Deep Neural Networks Using Relative Indexed Compressed Sparse Filter Encoding Format and Stacked Filters Stationary Flow
It is a challenging task to deploy computationally and memory intensive
State-of-the-art deep neural networks (DNNs) on embedded systems with limited
hardware resources and power budgets. Recently developed techniques like Deep
Compression make it possible to fit large DNNs, such as AlexNet and VGGNet,
fully in on-chip SRAM. But sparse networks compressed using existing encoding
formats, like CSR or CSC, complex the computation at runtime due to their
irregular memory access characteristics. In [1], we introduce a computation
dataflow, stacked filters stationary dataflow (SFS), and a corresponding data
encoding format, relative indexed compressed sparse filter format (CSF), to
make the best of data sparsity, and simplify data handling at execution time.
In this paper we present FPGA implementations of these methods. We implement
several compact streaming fully connected (FC) and Convolutional (CONV) neural
network processors to show their efficiency. Comparing with the
state-of-the-art results [2,3,4], our methods achieve at least 2x improvement
for computation efficiency per PE on most layers. Especially, our methods
achieve 8x improvement on AlexNet layer CONV4 with 384 filters, and 11x
improvement on VGG16 layer CONV5-3 with 512 filters