Oscillation-free Quantization for Low-bit Vision Transformers

Cheng, Kwang-Ting; Liu, Shih-Yang; Liu, Zechun

Oscillation-free Quantization for Low-bit Vision Transformers

Authors: Kwang-Ting Cheng
Shih-Yang Liu
Zechun Liu
Publication date: 2 June 2023
Publisher

Abstract

Weight oscillation is an undesirable side effect of quantization-aware training, in which quantized weights frequently jump between two quantized levels, resulting in training instability and a sub-optimal final model. We discover that the learnable scaling factor, a widely-used

\textit{de facto}

setting in quantization aggravates weight oscillation. In this study, we investigate the connection between the learnable scaling factor and quantized weight oscillation and use ViT as a case driver to illustrate the findings and remedies. In addition, we also found that the interdependence between quantized weights in

\textit{query}

and

\textit{key}

of a self-attention layer makes ViT vulnerable to oscillation. We, therefore, propose three techniques accordingly: statistical weight quantization (

\rm StatsQ

) to improve quantization robustness compared to the prevalent learnable-scale-based method; confidence-guided annealing (

\rm CGA

) that freezes the weights with

\textit{high confidence}

and calms the oscillating weights; and

\textit{query}

-

\textit{key}

reparameterization (

\rm QKR

) to resolve the query-key intertwined oscillation and mitigate the resulting gradient misestimation. Extensive experiments demonstrate that these proposed techniques successfully abate weight oscillation and consistently achieve substantial accuracy improvement on ImageNet. Specifically, our 2-bit DeiT-T/DeiT-S algorithms outperform the previous state-of-the-art by 9.8% and 7.7%, respectively. Code and models are available at: https://github.com/nbasyl/OFQ.Comment: Proceedings of the 40 th International Conference on Machine Learning, Honolulu, Hawaii, USA. PMLR 202, 202

Similar works

Full text

Available Versions

arXiv.org e-Print Archive

oai:arXiv.org:2302.02210

Last time updated on 02/03/2023