The early layers of a deep neural net have the fewest parameters, but take up
the most computation. In this extended abstract, we propose to only train the
hidden layers for a set portion of the training run, freezing them out
one-by-one and excluding them from the backward pass. Through experiments on
CIFAR, we empirically demonstrate that FreezeOut yields savings of up to 20%
wall-clock time during training with 3% loss in accuracy for DenseNets, a 20%
speedup without loss of accuracy for ResNets, and no improvement for VGG
networks. Our code is publicly available at
https://github.com/ajbrock/FreezeOutComment: Extended Abstrac