In recent years, deep neural networks have achieved remarkable accuracy in
computer vision tasks. With inference time being a crucial factor, particularly
in dense prediction tasks such as semantic segmentation, knowledge distillation
has emerged as a successful technique for improving the accuracy of lightweight
student networks. The existing methods often neglect the information in
channels and among different classes. To overcome these limitations, this paper
proposes a novel method called Inter-Class Similarity Distillation (ICSD) for
the purpose of knowledge distillation. The proposed method transfers high-order
relations from the teacher network to the student network by independently
computing intra-class distributions for each class from network outputs. This
is followed by calculating inter-class similarity matrices for distillation
using KL divergence between distributions of each pair of classes. To further
improve the effectiveness of the proposed method, an Adaptive Loss Weighting
(ALW) training strategy is proposed. Unlike existing methods, the ALW strategy
gradually reduces the influence of the teacher network towards the end of
training process to account for errors in teacher's predictions. Extensive
experiments conducted on two well-known datasets for semantic segmentation,
Cityscapes and Pascal VOC 2012, validate the effectiveness of the proposed
method in terms of mIoU and pixel accuracy. The proposed method outperforms
most of existing knowledge distillation methods as demonstrated by both
quantitative and qualitative evaluations. Code is available at:
https://github.com/AmirMansurian/AICSDComment: 10 pages, 5 figures, 5 table