465 research outputs found
Be Your Own Teacher: Improve the Performance of Convolutional Neural Networks via Self Distillation
Convolutional neural networks have been widely deployed in various
application scenarios. In order to extend the applications' boundaries to some
accuracy-crucial domains, researchers have been investigating approaches to
boost accuracy through either deeper or wider network structures, which brings
with them the exponential increment of the computational and storage cost,
delaying the responding time. In this paper, we propose a general training
framework named self distillation, which notably enhances the performance
(accuracy) of convolutional neural networks through shrinking the size of the
network rather than aggrandizing it. Different from traditional knowledge
distillation - a knowledge transformation methodology among networks, which
forces student neural networks to approximate the softmax layer outputs of
pre-trained teacher neural networks, the proposed self distillation framework
distills knowledge within network itself. The networks are firstly divided into
several sections. Then the knowledge in the deeper portion of the networks is
squeezed into the shallow ones. Experiments further prove the generalization of
the proposed self distillation framework: enhancement of accuracy at average
level is 2.65%, varying from 0.61% in ResNeXt as minimum to 4.07% in VGG19 as
maximum. In addition, it can also provide flexibility of depth-wise scalable
inference on resource-limited edge devices.Our codes will be released on github
soon.Comment: 10page
Regularizing Deep Networks by Modeling and Predicting Label Structure
We construct custom regularization functions for use in supervised training
of deep neural networks. Our technique is applicable when the ground-truth
labels themselves exhibit internal structure; we derive a regularizer by
learning an autoencoder over the set of annotations. Training thereby becomes a
two-phase procedure. The first phase models labels with an autoencoder. The
second phase trains the actual network of interest by attaching an auxiliary
branch that must predict output via a hidden layer of the autoencoder. After
training, we discard this auxiliary branch.
We experiment in the context of semantic segmentation, demonstrating this
regularization strategy leads to consistent accuracy boosts over baselines,
both when training from scratch, or in combination with ImageNet pretraining.
Gains are also consistent over different choices of convolutional network
architecture. As our regularizer is discarded after training, our method has
zero cost at test time; the performance improvements are essentially free. We
are simply able to learn better network weights by building an abstract model
of the label space, and then training the network to understand this
abstraction alongside the original task.Comment: to appear at CVPR 201
- …