8,224 research outputs found
Adaptive Bound Optimization for Online Convex Optimization
We introduce a new online convex optimization algorithm that adaptively
chooses its regularization function based on the loss functions observed so
far. This is in contrast to previous algorithms that use a fixed regularization
function such as L2-squared, and modify it only via a single time-dependent
parameter. Our algorithm's regret bounds are worst-case optimal, and for
certain realistic classes of loss functions they are much better than existing
bounds. These bounds are problem-dependent, which means they can exploit the
structure of the actual problem instance. Critically, however, our algorithm
does not need to know this structure in advance. Rather, we prove competitive
guarantees that show the algorithm provides a bound within a constant factor of
the best possible bound (of a certain functional form) in hindsight.Comment: Updates to match final COLT versio
Adaptive Normalized Risk-Averting Training For Deep Neural Networks
This paper proposes a set of new error criteria and learning approaches,
Adaptive Normalized Risk-Averting Training (ANRAT), to attack the non-convex
optimization problem in training deep neural networks (DNNs). Theoretically, we
demonstrate its effectiveness on global and local convexity lower-bounded by
the standard -norm error. By analyzing the gradient on the convexity index
, we explain the reason why to learn adaptively using
gradient descent works. In practice, we show how this method improves training
of deep neural networks to solve visual recognition tasks on the MNIST and
CIFAR-10 datasets. Without using pretraining or other tricks, we obtain results
comparable or superior to those reported in recent literature on the same tasks
using standard ConvNets + MSE/cross entropy. Performance on deep/shallow
multilayer perceptrons and Denoised Auto-encoders is also explored. ANRAT can
be combined with other quasi-Newton training methods, innovative network
variants, regularization techniques and other specific tricks in DNNs. Other
than unsupervised pretraining, it provides a new perspective to address the
non-convex optimization problem in DNNs.Comment: AAAI 2016, 0.39%~0.4% ER on MNIST with single 32-32-256-10 ConvNets,
code available at https://github.com/cauchyturing/ANRA
- …