Recent advances have indicated the strengths of self-supervised pre-training
for improving representation learning on downstream tasks. Existing works often
utilize self-supervised pre-trained models by fine-tuning on downstream tasks.
However, fine-tuning does not generalize to the case when one needs to build a
customized model architecture different from the self-supervised model. In this
work, we formulate a new knowledge distillation framework to transfer the
knowledge from self-supervised pre-trained models to any other student network
by a novel approach named Embedding Graph Alignment. Specifically, inspired by
the spirit of instance discrimination in self-supervised learning, we model the
instance-instance relations by a graph formulation in the feature embedding
space and distill the self-supervised teacher knowledge to a student network by
aligning the teacher graph and the student graph. Our distillation scheme can
be flexibly applied to transfer the self-supervised knowledge to enhance
representation learning on various student networks. We demonstrate that our
model outperforms multiple representative knowledge distillation methods on
three benchmark datasets, including CIFAR100, STL10, and TinyImageNet. Code is
here: https://github.com/yccm/EGA.Comment: British Machine Vision Conference (BMVC 2022