2 research outputs found
Hot Swapping for Online Adaptation of Optimization Hyperparameters
We describe a general framework for online adaptation of optimization
hyperparameters by `hot swapping' their values during learning. We investigate
this approach in the context of adaptive learning rate selection using an
explore-exploit strategy from the multi-armed bandit literature. Experiments on
a benchmark neural network show that the hot swapping approach leads to
consistently better solutions compared to well-known alternatives such as
AdaDelta and stochastic gradient with exhaustive hyperparameter search.Comment: Submission to ICLR 201
Cyclical Learning Rates for Training Neural Networks
It is known that the learning rate is the most important hyper-parameter to
tune for training deep neural networks. This paper describes a new method for
setting the learning rate, named cyclical learning rates, which practically
eliminates the need to experimentally find the best values and schedule for the
global learning rates. Instead of monotonically decreasing the learning rate,
this method lets the learning rate cyclically vary between reasonable boundary
values. Training with cyclical learning rates instead of fixed values achieves
improved classification accuracy without a need to tune and often in fewer
iterations. This paper also describes a simple way to estimate "reasonable
bounds" -- linearly increasing the learning rate of the network for a few
epochs. In addition, cyclical learning rates are demonstrated on the CIFAR-10
and CIFAR-100 datasets with ResNets, Stochastic Depth networks, and DenseNets,
and the ImageNet dataset with the AlexNet and GoogLeNet architectures. These
are practical tools for everyone who trains neural networks.Comment: Presented at WACV 2017; see https://github.com/bckenstler/CLR for
instructions to implement CLR in Kera