125,721 research outputs found
Parameter-free Mirror Descent
We develop a modified online mirror descent framework that is suitable for
building adaptive and parameter-free algorithms in unbounded domains. We
leverage this technique to develop the first unconstrained online linear
optimization algorithm achieving an optimal dynamic regret bound, and we
further demonstrate that natural strategies based on
Follow-the-Regularized-Leader are unable to achieve similar results. We also
apply our mirror descent framework to build new parameter-free implicit
updates, as well as a simplified and improved unconstrained scale-free
algorithm.Comment: 52 pages. v3: published at COLT 2022 + fixed typos; v2: improved the
algorithms in sections 3, 5, and 6 (tighter regret, simpler updates and
analysis), corrected minor technical details and fixed typo
Lipschitz and Comparator-Norm Adaptivity in Online Learning
We study Online Convex Optimization in the unbounded setting where neither
predictions nor gradient are constrained. The goal is to simultaneously adapt
to both the sequence of gradients and the comparator. We first develop
parameter-free and scale-free algorithms for a simplified setting with hints.
We present two versions: the first adapts to the squared norms of both
comparator and gradients separately using time per round, the second
adapts to their squared inner products (which measure variance only in the
comparator direction) in time per round. We then generalize two prior
reductions to the unbounded setting; one to not need hints, and a second to
deal with the range ratio problem (which already arises in prior work). We
discuss their optimality in light of prior and new lower bounds. We apply our
methods to obtain sharper regret bounds for scale-invariant online prediction
with linear models.Comment: 30 Pages, 1 Figur
Training Deep Networks without Learning Rates Through Coin Betting
Deep learning methods achieve state-of-the-art performance in many application scenarios. Yet, these methods require a significant amount of hyperparameters tuning in order to achieve the best results. In particular, tuning the learning rates in the stochastic optimization process is still one of the main bottlenecks. In this paper, we propose a new stochastic gradient descent procedure for deep networks that does not require any learning rate setting. Contrary to previous methods, we do not adapt the learning rates nor we make use of the assumed curvature of the objective function. Instead, we reduce the optimization process to a game of betting on a coin and propose a learning rate free optimal algorithm for this scenario. Theoretical convergence is proven for convex and quasi-convex functions and empirical evidence shows the advantage of our algorithm over popular stochastic gradient algorithms
Asynchronous Parallel Stochastic Gradient Descent - A Numeric Core for Scalable Distributed Machine Learning Algorithms
The implementation of a vast majority of machine learning (ML) algorithms
boils down to solving a numerical optimization problem. In this context,
Stochastic Gradient Descent (SGD) methods have long proven to provide good
results, both in terms of convergence and accuracy. Recently, several
parallelization approaches have been proposed in order to scale SGD to solve
very large ML problems. At their core, most of these approaches are following a
map-reduce scheme. This paper presents a novel parallel updating algorithm for
SGD, which utilizes the asynchronous single-sided communication paradigm.
Compared to existing methods, Asynchronous Parallel Stochastic Gradient Descent
(ASGD) provides faster (or at least equal) convergence, close to linear scaling
and stable accuracy
- …