Exploit Where Optimizer Explores via Residuals

Huang, Heng; Huo, Zhouyuan; Xu, An

Exploit Where Optimizer Explores via Residuals

Authors: Heng Huang
Zhouyuan Huo
An Xu
Publication date: 13 August 2020
Publisher

Abstract

In order to train the neural networks faster, many efforts have been devoted to exploring a better solution trajectory, but few have been put into exploiting the existing solution trajectory. To exploit the trajectory of (momentum) stochastic gradient descent (SGD(m)) method, we propose a novel method named SGD(m) with residuals (RSGD(m)), which leads to a performance boost of both the convergence and generalization. Our new method can also be applied to other optimizers such as ASGD and Adam. We provide theoretical analysis to show that RSGD achieves a smaller growth rate of the generalization error and the same (but empirically better) convergence rate compared with SGD. Extensive deep learning experiments on image classification, language modeling and graph convolutional neural networks show that the proposed algorithm is faster than SGD(m)/Adam at the initial training stage, and similar to or better than SGD(m) at the end of training with better generalization error

Similar works

Full text

Available Versions

arXiv.org e-Print Archive

oai:arXiv.org:2004.05298

Last time updated on 15/08/2020