After the tremendous success of convolutional neural networks in image
classification, object detection, speech recognition, etc., there is now rising
demand for deployment of these compute-intensive ML models on tightly power
constrained embedded and mobile systems at low cost as well as for pushing the
throughput in data centers. This has triggered a wave of research towards
specialized hardware accelerators. Their performance is often constrained by
I/O bandwidth and the energy consumption is dominated by I/O transfers to
off-chip memory. We introduce and evaluate a novel, hardware-friendly
compression scheme for the feature maps present within convolutional neural
networks. We show that an average compression ratio of 4.4x relative to
uncompressed data and a gain of 60% over existing method can be achieved for
ResNet-34 with a compression block requiring <300 bit of sequential cells and
minimal combinational logic