26 research outputs found

    Training Deep Networks without Learning Rates Through Coin Betting

    Get PDF
    Deep learning methods achieve state-of-the-art performance in many application scenarios. Yet, these methods require a significant amount of hyperparameters tuning in order to achieve the best results. In particular, tuning the learning rates in the stochastic optimization process is still one of the main bottlenecks. In this paper, we propose a new stochastic gradient descent procedure for deep networks that does not require any learning rate setting. Contrary to previous methods, we do not adapt the learning rates nor we make use of the assumed curvature of the objective function. Instead, we reduce the optimization process to a game of betting on a coin and propose a learning rate free optimal algorithm for this scenario. Theoretical convergence is proven for convex and quasi-convex functions and empirical evidence shows the advantage of our algorithm over popular stochastic gradient algorithms

    Unconstrained Online Linear Learning in Hilbert Spaces: Minimax Algorithms and Normal Approximations

    Full text link
    We study algorithms for online linear optimization in Hilbert spaces, focusing on the case where the player is unconstrained. We develop a novel characterization of a large class of minimax algorithms, recovering, and even improving, several previous results as immediate corollaries. Moreover, using our tools, we develop an algorithm that provides a regret bound of O(UTlog(UTlog2T+1))\mathcal{O}\Big(U \sqrt{T \log(U \sqrt{T} \log^2 T +1)}\Big), where UU is the L2L_2 norm of an arbitrary comparator and both TT and UU are unknown to the player. This bound is optimal up to loglogT\sqrt{\log \log T} terms. When TT is known, we derive an algorithm with an optimal regret bound (up to constant factors). For both the known and unknown TT case, a Normal approximation to the conditional value of the game proves to be the key analysis tool.Comment: Proceedings of the 27th Annual Conference on Learning Theory (COLT 2014

    Simultaneous Model Selection and Optimization through Parameter-free Stochastic Learning

    Full text link
    Stochastic gradient descent algorithms for training linear and kernel predictors are gaining more and more importance, thanks to their scalability. While various methods have been proposed to speed up their convergence, the model selection phase is often ignored. In fact, in theoretical works most of the time assumptions are made, for example, on the prior knowledge of the norm of the optimal solution, while in the practical world validation methods remain the only viable approach. In this paper, we propose a new kernel-based stochastic gradient descent algorithm that performs model selection while training, with no parameters to tune, nor any form of cross-validation. The algorithm builds on recent advancement in online learning theory for unconstrained settings, to estimate over time the right regularization in a data-dependent way. Optimal rates of convergence are proved under standard smoothness assumptions on the target function, using the range space of the fractional integral operator associated with the kernel
    corecore