1 research outputs found
An Inter-Layer Weight Prediction and Quantization for Deep Neural Networks based on a Smoothly Varying Weight Hypothesis
Due to a resource-constrained environment, network compression has become an
important part of deep neural networks research. In this paper, we propose a
new compression method, \textit{Inter-Layer Weight Prediction} (ILWP) and
quantization method which quantize the predicted residuals between the weights
in all convolution layers based on an inter-frame prediction method in
conventional video coding schemes. Furthermore, we found a phenomenon
\textit{Smoothly Varying Weight Hypothesis} (SVWH) which is that the weights in
adjacent convolution layers share strong similarity in shapes and values, i.e.,
the weights tend to vary smoothly along with the layers. Based on SVWH, we
propose a second ILWP and quantization method which quantize the predicted
residuals between the weights in adjacent convolution layers. Since the
predicted weight residuals tend to follow Laplace distributions with very low
variance, the weight quantization can more effectively be applied, thus
producing more zero weights and enhancing the weight compression ratio. In
addition, we propose a new \textit{inter-layer loss} for eliminating
non-texture bits, which enabled us to more effectively store only texture bits.
That is, the proposed loss regularizes the weights such that the collocated
weights between the adjacent two layers have the same values. Finally, we
propose an ILWP with an inter-layer loss and quantization method. Our
comprehensive experiments show that the proposed method achieves a much higher
weight compression rate at the same accuracy level compared with the previous
quantization-based compression methods in deep neural networks.Comment: 12 pages, 7 figure