681 research outputs found
Extended Bit-Plane Compression for Convolutional Neural Network Accelerators
After the tremendous success of convolutional neural networks in image
classification, object detection, speech recognition, etc., there is now rising
demand for deployment of these compute-intensive ML models on tightly power
constrained embedded and mobile systems at low cost as well as for pushing the
throughput in data centers. This has triggered a wave of research towards
specialized hardware accelerators. Their performance is often constrained by
I/O bandwidth and the energy consumption is dominated by I/O transfers to
off-chip memory. We introduce and evaluate a novel, hardware-friendly
compression scheme for the feature maps present within convolutional neural
networks. We show that an average compression ratio of 4.4x relative to
uncompressed data and a gain of 60% over existing method can be achieved for
ResNet-34 with a compression block requiring <300 bit of sequential cells and
minimal combinational logic
- …