Outlier Suppression+: Accurate quantization of large language models by
  equivalent and optimal shifting and scaling

Gong, Ruihao; Guo, Jinyang; Li, Yuhang; Liu, Xianglong; Wei, Xiuying; Zhang, Xiangguo; Zhang, Yunchen

Outlier Suppression+: Accurate quantization of large language models by equivalent and optimal shifting and scaling

Authors: Ruihao Gong
Jinyang Guo
Yuhang Li
Xianglong Liu
Xiuying Wei
Xiangguo Zhang
Yunchen Zhang
Publication date: 18 April 2023
Publisher

Abstract

Quantization of transformer language models faces significant challenges due to the existence of detrimental outliers in activations. We observe that these outliers are asymmetric and concentrated in specific channels. To address this issue, we propose the Outlier Suppression+ framework. First, we introduce channel-wise shifting and scaling operations to eliminate asymmetric presentation and scale down problematic channels. We demonstrate that these operations can be seamlessly migrated into subsequent modules while maintaining equivalence. Second, we quantitatively analyze the optimal values for shifting and scaling, taking into account both the asymmetric property and quantization errors of weights in the next layer. Our lightweight framework can incur minimal performance degradation under static and standard post-training quantization settings. Comprehensive results across various tasks and models reveal that our approach achieves near-floating-point performance on both small models, such as BERT, and large language models (LLMs) including OPTs, BLOOM, and BLOOMZ at 8-bit and 6-bit settings. Furthermore, we establish a new state of the art for 4-bit BERT

Similar works

Full text

Available Versions

arXiv.org e-Print Archive

oai:arXiv.org:2304.09145

Last time updated on 22/04/2023