1,682 research outputs found
Training Deep Networks without Learning Rates Through Coin Betting
Deep learning methods achieve state-of-the-art performance in many application scenarios. Yet, these methods require a significant amount of hyperparameters tuning in order to achieve the best results. In particular, tuning the learning rates in the stochastic optimization process is still one of the main bottlenecks. In this paper, we propose a new stochastic gradient descent procedure for deep networks that does not require any learning rate setting. Contrary to previous methods, we do not adapt the learning rates nor we make use of the assumed curvature of the objective function. Instead, we reduce the optimization process to a game of betting on a coin and propose a learning rate free optimal algorithm for this scenario. Theoretical convergence is proven for convex and quasi-convex functions and empirical evidence shows the advantage of our algorithm over popular stochastic gradient algorithms
Training Deep Networks without Learning Rates Through Coin Betting
Deep learning methods achieve state-of-the-art performance in many application scenarios. Yet, these methods require a significant amount of hyperparameters tuning in order to achieve the best results. In particular, tuning the learning rates in the stochastic optimization process is still one of the main bottlenecks. In this paper, we propose a new stochastic gradient descent procedure for deep networks that does not require any learning rate setting. Contrary to previous methods, we do not adapt the learning rates nor we make use of the assumed curvature of the objective function. Instead, we reduce the optimization process to a game of betting on a coin and propose a learning rate free optimal algorithm for this scenario. Theoretical convergence is proven for convex and quasi-convex functions and empirical evidence shows the advantage of our algorithm over popular stochastic gradient algorithms
Parameter-free locally differentially private stochastic subgradient descent
https://arxiv.org/pdf/1911.09564.pdfhttps://arxiv.org/pdf/1911.09564.pdfhttps://arxiv.org/pdf/1911.09564.pdfhttps://arxiv.org/pdf/1911.09564.pdfhttps://arxiv.org/pdf/1911.09564.pdfhttps://arxiv.org/pdf/1911.09564.pdfPublished versio
A Modern Introduction to Online Learning
In this monograph, I introduce the basic concepts of Online Learning through
a modern view of Online Convex Optimization. Here, online learning refers to
the framework of regret minimization under worst-case assumptions. I present
first-order and second-order algorithms for online learning with convex losses,
in Euclidean and non-Euclidean settings. All the algorithms are clearly
presented as instantiation of Online Mirror Descent or
Follow-The-Regularized-Leader and their variants. Particular attention is given
to the issue of tuning the parameters of the algorithms and learning in
unbounded domains, through adaptive and parameter-free online learning
algorithms. Non-convex losses are dealt through convex surrogate losses and
through randomization. The bandit setting is also briefly discussed, touching
on the problem of adversarial and stochastic multi-armed bandits. These notes
do not require prior knowledge of convex analysis and all the required
mathematical tools are rigorously explained. Moreover, all the proofs have been
carefully chosen to be as simple and as short as possible.Comment: Fixed more typos, added more history bits, added local norms bounds
for OMD and FTR
CoinEM: Tuning-Free Particle-Based Variational Inference for Latent Variable Models
We introduce two new particle-based algorithms for learning latent variable
models via marginal maximum likelihood estimation, including one which is
entirely tuning-free. Our methods are based on the perspective of marginal
maximum likelihood estimation as an optimization problem: namely, as the
minimization of a free energy functional. One way to solve this problem is to
consider the discretization of a gradient flow associated with the free energy.
We study one such approach, which resembles an extension of the popular Stein
variational gradient descent algorithm. In particular, we establish a descent
lemma for this algorithm, which guarantees that the free energy decreases at
each iteration. This method, and any other obtained as the discretization of
the gradient flow, will necessarily depend on a learning rate which must be
carefully tuned by the practitioner in order to ensure convergence at a
suitable rate. With this in mind, we also propose another algorithm for
optimizing the free energy which is entirely learning rate free, based on coin
betting techniques from convex optimization. We validate the performance of our
algorithms across a broad range of numerical experiments, including several
high-dimensional settings. Our results are competitive with existing
particle-based methods, without the need for any hyperparameter tuning
Coin Sampling: Gradient-Based Bayesian Inference without Learning Rates
In recent years, particle-based variational inference (ParVI) methods such as
Stein variational gradient descent (SVGD) have grown in popularity as scalable
methods for Bayesian inference. Unfortunately, the properties of such methods
invariably depend on hyperparameters such as the learning rate, which must be
carefully tuned by the practitioner in order to ensure convergence to the
target measure at a suitable rate. In this paper, we introduce a suite of new
particle-based methods for scalable Bayesian inference based on coin betting,
which are entirely learning-rate free. We illustrate the performance of our
approach on a range of numerical examples, including several high-dimensional
models and datasets, demonstrating comparable performance to other ParVI
algorithms with no need to tune a learning rate.Comment: ICML 202
Learning-Rate-Free Learning by D-Adaptation
The speed of gradient descent for convex Lipschitz functions is highly
dependent on the choice of learning rate. Setting the learning rate to achieve
the optimal convergence rate requires knowing the distance D from the initial
point to the solution set. In this work, we describe a single-loop method, with
no back-tracking or line searches, which does not require knowledge of yet
asymptotically achieves the optimal rate of convergence for the complexity
class of convex Lipschitz functions. Our approach is the first parameter-free
method for this class without additional multiplicative log factors in the
convergence rate. We present extensive experiments for SGD and Adam variants of
our method, where the method automatically matches hand-tuned learning rates
across more than a dozen diverse machine learning problems, including
large-scale vision and language problems. Our method is practical, efficient
and requires no additional function value or gradient evaluations each step. An
open-source implementation is available
(https://github.com/facebookresearch/dadaptation)
- …