Search CORE

8,224 research outputs found

Adaptive Bound Optimization for Online Convex Optimization

Author: McMahan H. Brendan
Streeter Matthew
Publication venue
Publication date: 01/01/2010
Field of study

We introduce a new online convex optimization algorithm that adaptively chooses its regularization function based on the loss functions observed so far. This is in contrast to previous algorithms that use a fixed regularization function such as L2-squared, and modify it only via a single time-dependent parameter. Our algorithm's regret bounds are worst-case optimal, and for certain realistic classes of loss functions they are much better than existing bounds. These bounds are problem-dependent, which means they can exploit the structure of the actual problem instance. Critically, however, our algorithm does not need to know this structure in advance. Rather, we prove competitive guarantees that show the algorithm provides a bound within a constant factor of the best possible bound (of a certain functional form) in hindsight.Comment: Updates to match final COLT versio

arXiv.org e-Print Archive

CiteSeerX

Adaptive Normalized Risk-Averting Training For Deep Neural Networks

Author: Lo James
Oates Tim
Wang Zhiguang
Publication venue
Publication date: 02/03/2016
Field of study

This paper proposes a set of new error criteria and learning approaches, Adaptive Normalized Risk-Averting Training (ANRAT), to attack the non-convex optimization problem in training deep neural networks (DNNs). Theoretically, we demonstrate its effectiveness on global and local convexity lower-bounded by the standard

L_p

-norm error. By analyzing the gradient on the convexity index

\lambda

, we explain the reason why to learn

\lambda

adaptively using gradient descent works. In practice, we show how this method improves training of deep neural networks to solve visual recognition tasks on the MNIST and CIFAR-10 datasets. Without using pretraining or other tricks, we obtain results comparable or superior to those reported in recent literature on the same tasks using standard ConvNets + MSE/cross entropy. Performance on deep/shallow multilayer perceptrons and Denoised Auto-encoders is also explored. ANRAT can be combined with other quasi-Newton training methods, innovative network variants, regularization techniques and other specific tricks in DNNs. Other than unsupervised pretraining, it provides a new perspective to address the non-convex optimization problem in DNNs.Comment: AAAI 2016, 0.39%~0.4% ER on MNIST with single 32-32-256-10 ConvNets, code available at https://github.com/cauchyturing/ANRA

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications