Conventional knowledge distillation (KD) methods for object detection mainly
concentrate on homogeneous teacher-student detectors. However, the design of a
lightweight detector for deployment is often significantly different from a
high-capacity detector. Thus, we investigate KD among heterogeneous
teacher-student pairs for a wide application. We observe that the core
difficulty for heterogeneous KD (hetero-KD) is the significant semantic gap
between the backbone features of heterogeneous detectors due to the different
optimization manners. Conventional homogeneous KD (homo-KD) methods suffer from
such a gap and are hard to directly obtain satisfactory performance for
hetero-KD. In this paper, we propose the HEtero-Assists Distillation (HEAD)
framework, leveraging heterogeneous detection heads as assistants to guide the
optimization of the student detector to reduce this gap. In HEAD, the assistant
is an additional detection head with the architecture homogeneous to the
teacher head attached to the student backbone. Thus, a hetero-KD is transformed
into a homo-KD, allowing efficient knowledge transfer from the teacher to the
student. Moreover, we extend HEAD into a Teacher-Free HEAD (TF-HEAD) framework
when a well-trained teacher detector is unavailable. Our method has achieved
significant improvement compared to current detection KD methods. For example,
on the MS-COCO dataset, TF-HEAD helps R18 RetinaNet achieve 33.9 mAP (+2.2),
while HEAD further pushes the limit to 36.2 mAP (+4.5).Comment: ECCV 2022, Code: https://github.com/LutingWang/HEA