Neural network quantization aims to accelerate and trim full-precision neural
network models by using low bit approximations. Methods adopting the
quantization aware training (QAT) paradigm have recently seen a rapid growth,
but are often conceptually complicated. This paper proposes a novel and highly
effective QAT method, quantized feature distillation (QFD). QFD first trains a
quantized (or binarized) representation as the teacher, then quantize the
network using knowledge distillation (KD). Quantitative results show that QFD
is more flexible and effective (i.e., quantization friendly) than previous
quantization methods. QFD surpasses existing methods by a noticeable margin on
not only image classification but also object detection, albeit being much
simpler. Furthermore, QFD quantizes ViT and Swin-Transformer on MS-COCO
detection and segmentation, which verifies its potential in real world
deployment. To the best of our knowledge, this is the first time that vision
transformers have been quantized in object detection and image segmentation
tasks.Comment: AAAI202