26 research outputs found
Training Deep Networks without Learning Rates Through Coin Betting
Deep learning methods achieve state-of-the-art performance in many application scenarios. Yet, these methods require a significant amount of hyperparameters tuning in order to achieve the best results. In particular, tuning the learning rates in the stochastic optimization process is still one of the main bottlenecks. In this paper, we propose a new stochastic gradient descent procedure for deep networks that does not require any learning rate setting. Contrary to previous methods, we do not adapt the learning rates nor we make use of the assumed curvature of the objective function. Instead, we reduce the optimization process to a game of betting on a coin and propose a learning rate free optimal algorithm for this scenario. Theoretical convergence is proven for convex and quasi-convex functions and empirical evidence shows the advantage of our algorithm over popular stochastic gradient algorithms
Unconstrained Online Linear Learning in Hilbert Spaces: Minimax Algorithms and Normal Approximations
We study algorithms for online linear optimization in Hilbert spaces,
focusing on the case where the player is unconstrained. We develop a novel
characterization of a large class of minimax algorithms, recovering, and even
improving, several previous results as immediate corollaries. Moreover, using
our tools, we develop an algorithm that provides a regret bound of
, where is
the norm of an arbitrary comparator and both and are unknown to
the player. This bound is optimal up to terms. When is
known, we derive an algorithm with an optimal regret bound (up to constant
factors). For both the known and unknown case, a Normal approximation to
the conditional value of the game proves to be the key analysis tool.Comment: Proceedings of the 27th Annual Conference on Learning Theory (COLT
2014
Simultaneous Model Selection and Optimization through Parameter-free Stochastic Learning
Stochastic gradient descent algorithms for training linear and kernel
predictors are gaining more and more importance, thanks to their scalability.
While various methods have been proposed to speed up their convergence, the
model selection phase is often ignored. In fact, in theoretical works most of
the time assumptions are made, for example, on the prior knowledge of the norm
of the optimal solution, while in the practical world validation methods remain
the only viable approach. In this paper, we propose a new kernel-based
stochastic gradient descent algorithm that performs model selection while
training, with no parameters to tune, nor any form of cross-validation. The
algorithm builds on recent advancement in online learning theory for
unconstrained settings, to estimate over time the right regularization in a
data-dependent way. Optimal rates of convergence are proved under standard
smoothness assumptions on the target function, using the range space of the
fractional integral operator associated with the kernel