2 research outputs found
Knowledge Distillation Using Hierarchical Self-Supervision Augmented Distribution
Knowledge distillation (KD) is an effective framework that aims to transfer
meaningful information from a large teacher to a smaller student. Generally, KD
often involves how to define and transfer knowledge. Previous KD methods often
focus on mining various forms of knowledge, for example, feature maps and
refined information. However, the knowledge is derived from the primary
supervised task and thus is highly task-specific. Motivated by the recent
success of self-supervised representation learning, we propose an auxiliary
self-supervision augmented task to guide networks to learn more meaningful
features. Therefore, we can derive soft self-supervision augmented
distributions as richer dark knowledge from this task for KD. Unlike previous
knowledge, this distribution encodes joint knowledge from supervised and
self-supervised feature learning. Beyond knowledge exploration, we propose to
append several auxiliary branches at various hidden layers, to fully take
advantage of hierarchical feature maps. Each auxiliary branch is guided to
learn self-supervision augmented task and distill this distribution from
teacher to student. Overall, we call our KD method as Hierarchical
Self-Supervision Augmented Knowledge Distillation (HSSAKD). Experiments on
standard image classification show that both offline and online HSSAKD achieves
state-of-the-art performance in the field of KD. Further transfer experiments
on object detection further verify that HSSAKD can guide the network to learn
better features. The code is available at https://github.com/winycg/HSAKD.Comment: 15 pages, Accepted by IEEE Transactions on Neural Networks and
Learning Systems 202