649 research outputs found
BlockDrop: Dynamic Inference Paths in Residual Networks
Very deep convolutional neural networks offer excellent recognition results,
yet their computational expense limits their impact for many real-world
applications. We introduce BlockDrop, an approach that learns to dynamically
choose which layers of a deep network to execute during inference so as to best
reduce total computation without degrading prediction accuracy. Exploiting the
robustness of Residual Networks (ResNets) to layer dropping, our framework
selects on-the-fly which residual blocks to evaluate for a given novel image.
In particular, given a pretrained ResNet, we train a policy network in an
associative reinforcement learning setting for the dual reward of utilizing a
minimal number of blocks while preserving recognition accuracy. We conduct
extensive experiments on CIFAR and ImageNet. The results provide strong
quantitative and qualitative evidence that these learned policies not only
accelerate inference but also encode meaningful visual information. Built upon
a ResNet-101 model, our method achieves a speedup of 20\% on average, going as
high as 36\% for some images, while maintaining the same 76.4\% top-1 accuracy
on ImageNet.Comment: CVPR 201
InDistill: Information flow-preserving knowledge distillation for model compression
In this paper we introduce InDistill, a model compression approach that
combines knowledge distillation and channel pruning in a unified framework for
the transfer of the critical information flow paths from a heavyweight teacher
to a lightweight student. Such information is typically collapsed in previous
methods due to an encoding stage prior to distillation. By contrast, InDistill
leverages a pruning operation applied to the teacher's intermediate layers
reducing their width to the corresponding student layers' width. In that way,
we force architectural alignment enabling the intermediate layers to be
directly distilled without the need of an encoding stage. Additionally, a
curriculum learning-based training scheme is adopted considering the
distillation difficulty of each layer and the critical learning periods in
which the information flow paths are created. The proposed method surpasses
state-of-the-art performance on three standard benchmarks, i.e. CIFAR-10,
CUB-200, and FashionMNIST by 3.08%, 14.27%, and 1% mAP, respectively, as well
as on more challenging evaluation settings, i.e. ImageNet and CIFAR-100 by
1.97% and 5.65% mAP, respectively
- …