1 research outputs found
Improving Robustness with Adaptive Weight Decay
We propose adaptive weight decay, which automatically tunes the
hyper-parameter for weight decay during each training iteration. For
classification problems, we propose changing the value of the weight decay
hyper-parameter on the fly based on the strength of updates from the
classification loss (i.e., gradient of cross-entropy), and the regularization
loss (i.e., -norm of the weights). We show that this simple
modification can result in large improvements in adversarial robustness -- an
area which suffers from robust overfitting -- without requiring extra data
across various datasets and architecture choices. For example, our
reformulation results in relative robustness improvement for CIFAR-100,
and relative robustness improvement on CIFAR-10 comparing to the best
tuned hyper-parameters of traditional weight decay resulting in models that
have comparable performance to SOTA robustness methods. In addition, this
method has other desirable properties, such as less sensitivity to learning
rate, and smaller weight norms, which the latter contributes to robustness to
overfitting to label noise, and pruning