Search CORE

125,721 research outputs found

Parameter-free Mirror Descent

Author: Cutkosky Ashok
Jacobsen Andrew
Publication venue
Publication date: 17/09/2022
Field of study

We develop a modified online mirror descent framework that is suitable for building adaptive and parameter-free algorithms in unbounded domains. We leverage this technique to develop the first unconstrained online linear optimization algorithm achieving an optimal dynamic regret bound, and we further demonstrate that natural strategies based on Follow-the-Regularized-Leader are unable to achieve similar results. We also apply our mirror descent framework to build new parameter-free implicit updates, as well as a simplified and improved unconstrained scale-free algorithm.Comment: 52 pages. v3: published at COLT 2022 + fixed typos; v2: improved the algorithms in sections 3, 5, and 6 (tighter regret, simpler updates and analysis), corrected minor technical details and fixed typo

arXiv.org e-Print Archive

Lipschitz and Comparator-Norm Adaptivity in Online Learning

Author: Koolen Wouter M.
Mhammedi Zakaria
Publication venue
Publication date: 27/02/2020
Field of study

We study Online Convex Optimization in the unbounded setting where neither predictions nor gradient are constrained. The goal is to simultaneously adapt to both the sequence of gradients and the comparator. We first develop parameter-free and scale-free algorithms for a simplified setting with hints. We present two versions: the first adapts to the squared norms of both comparator and gradients separately using

O(d)

time per round, the second adapts to their squared inner products (which measure variance only in the comparator direction) in time

O(d^3)

per round. We then generalize two prior reductions to the unbounded setting; one to not need hints, and a second to deal with the range ratio problem (which already arises in prior work). We discuss their optimality in light of prior and new lower bounds. We apply our methods to obtain sharper regret bounds for scale-invariant online prediction with linear models.Comment: 30 Pages, 1 Figur

arXiv.org e-Print Archive

CWI's Institutional Repository

Training Deep Networks without Learning Rates Through Coin Betting

Author: Orabona Francesco
Tommasi Tatiana
Publication venue: Neural information processing systems foundation
Publication date: 01/01/2017
Field of study

Deep learning methods achieve state-of-the-art performance in many application scenarios. Yet, these methods require a significant amount of hyperparameters tuning in order to achieve the best results. In particular, tuning the learning rates in the stochastic optimization process is still one of the main bottlenecks. In this paper, we propose a new stochastic gradient descent procedure for deep networks that does not require any learning rate setting. Contrary to previous methods, we do not adapt the learning rates nor we make use of the assumed curvature of the objective function. Instead, we reduce the optimization process to a game of betting on a coin and propose a learning rate free optimal algorithm for this scenario. Theoretical convergence is proven for convex and quasi-convex functions and empirical evidence shows the advantage of our algorithm over popular stochastic gradient algorithms

arXiv.org e-Print Archive

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

Archivio della ricerca- Università di Roma La Sapienza

Asynchronous Parallel Stochastic Gradient Descent - A Numeric Core for Scalable Distributed Machine Learning Algorithms

Author: Keuper Janis
Pfreundt Franz-Josef
Publication venue
Publication date: 05/10/2015
Field of study

The implementation of a vast majority of machine learning (ML) algorithms boils down to solving a numerical optimization problem. In this context, Stochastic Gradient Descent (SGD) methods have long proven to provide good results, both in terms of convergence and accuracy. Recently, several parallelization approaches have been proposed in order to scale SGD to solve very large ML problems. At their core, most of these approaches are following a map-reduce scheme. This paper presents a novel parallel updating algorithm for SGD, which utilizes the asynchronous single-sided communication paradigm. Compared to existing methods, Asynchronous Parallel Stochastic Gradient Descent (ASGD) provides faster (or at least equal) convergence, close to linear scaling and stable accuracy

arXiv.org e-Print Archive

Fraunhofer-ePrints