2 research outputs found
Mirror Descent View for Neural Network Quantization
Quantizing large Neural Networks (NN) while maintaining the performance is
highly desirable for resource-limited devices due to reduced memory and time
complexity. It is usually formulated as a constrained optimization problem and
optimized via a modified version of gradient descent. In this work, by
interpreting the continuous parameters (unconstrained) as the dual of the
quantized ones, we introduce a Mirror Descent (MD) framework for NN
quantization. Specifically, we provide conditions on the projections (i.e.,
mapping from continuous to quantized ones) which would enable us to derive
valid mirror maps and in turn the respective MD updates. Furthermore, we
present a numerically stable implementation of MD that requires storing an
additional set of auxiliary variables (unconstrained), and show that it is
strikingly analogous to the Straight Through Estimator (STE) based method which
is typically viewed as a "trick" to avoid vanishing gradients issue. Our
experiments on CIFAR-10/100, TinyImageNet, and ImageNet classification datasets
with VGG-16, ResNet-18, and MobileNetV2 architectures show that our MD variants
obtain quantized networks with state-of-the-art performance. Code is available
at https://github.com/kartikgupta-at-anu/md-bnn.Comment: This paper was accepted at AISTATS 202