In this paper we introduce InDistill, a model compression approach that
combines knowledge distillation and channel pruning in a unified framework for
the transfer of the critical information flow paths from a heavyweight teacher
to a lightweight student. Such information is typically collapsed in previous
methods due to an encoding stage prior to distillation. By contrast, InDistill
leverages a pruning operation applied to the teacher's intermediate layers
reducing their width to the corresponding student layers' width. In that way,
we force architectural alignment enabling the intermediate layers to be
directly distilled without the need of an encoding stage. Additionally, a
curriculum learning-based training scheme is adopted considering the
distillation difficulty of each layer and the critical learning periods in
which the information flow paths are created. The proposed method surpasses
state-of-the-art performance on three standard benchmarks, i.e. CIFAR-10,
CUB-200, and FashionMNIST by 3.08%, 14.27%, and 1% mAP, respectively, as well
as on more challenging evaluation settings, i.e. ImageNet and CIFAR-100 by
1.97% and 5.65% mAP, respectively