4,709 research outputs found
Learning to Quantize Deep Networks by Optimizing Quantization Intervals with Task Loss
Reducing bit-widths of activations and weights of deep networks makes it
efficient to compute and store them in memory, which is crucial in their
deployments to resource-limited devices, such as mobile phones. However,
decreasing bit-widths with quantization generally yields drastically degraded
accuracy. To tackle this problem, we propose to learn to quantize activations
and weights via a trainable quantizer that transforms and discretizes them.
Specifically, we parameterize the quantization intervals and obtain their
optimal values by directly minimizing the task loss of the network. This
quantization-interval-learning (QIL) allows the quantized networks to maintain
the accuracy of the full-precision (32-bit) networks with bit-width as low as
4-bit and minimize the accuracy degeneration with further bit-width reduction
(i.e., 3 and 2-bit). Moreover, our quantizer can be trained on a heterogeneous
dataset, and thus can be used to quantize pretrained networks without access to
their training data. We demonstrate the effectiveness of our trainable
quantizer on ImageNet dataset with various network architectures such as
ResNet-18, -34 and AlexNet, on which it outperforms existing methods to achieve
the state-of-the-art accuracy
- …