Search CORE

55 research outputs found

Lipschitz and Comparator-Norm Adaptivity in Online Learning

Author: Koolen Wouter M.
Mhammedi Zakaria
Publication venue
Publication date: 27/02/2020
Field of study

O(d)

time per round, the second adapts to their squared inner products (which measure variance only in the comparator direction) in time

O(d^3)

per round. We then generalize two prior reductions to the unbounded setting; one to not need hints, and a second to deal with the range ratio problem (which already arises in prior work). We discuss their optimality in light of prior and new lower bounds. We apply our methods to obtain sharper regret bounds for scale-invariant online prediction with linear models.Comment: 30 Pages, 1 Figur

arXiv.org e-Print Archive

CWI's Institutional Repository

Lipschitz and comparator-norm adaptivity in online learning

Author: Koolen-Wijkstra W.M. (Wouter)
Mhammedi Z. (Zakaria)
Publication venue
Publication date: 27/02/2020
Field of study

We study Online Convex Optimization in the unbounded setting where neither predictions nor gradient are constrained. The goal is to simultaneously adapt to both the sequence of gradients and the comparator. We first develop parameter-free and scale-free algorithms for a simplified setting with hints. We present two versions: the first adapts to the squared norms of both comparator and gradients separately using O(d) time per round, the second adapts to their squared inner products (which measure variance only in the comparator direction) in time O(d3) per round. We then generalize two prior reductions t

CWI's Institutional Repository

Lipschitz and comparator-norm adaptivity in online learning

Author: Koolen-Wijkstra W.M. (Wouter)
Mhammedi Z. (Zakaria)
Publication venue
Publication date: 09/07/2020
Field of study

O(d)

time per round, the second adapts to their squared inner products (which measure variance only in the comparator direction) in time

O(d^3)

per round. We then generalize two prior reducti

CWI's Institutional Repository

Lipschitz Adaptivity with Multiple Learning Rates in Online Learning

Author: Koolen Wouter M.
Mhammedi Zakaria
van Erven Tim
Publication venue
Publication date: 30/05/2019
Field of study

We aim to design adaptive online learning algorithms that take advantage of any special structure that might be present in the learning task at hand, with as little manual tuning by the user as possible. A fundamental obstacle that comes up in the design of such adaptive algorithms is to calibrate a so-called step-size or learning rate hyperparameter depending on variance, gradient norms, etc. A recent technique promises to overcome this difficulty by maintaining multiple learning rates in parallel. This technique has been applied in the MetaGrad algorithm for online convex optimization and the Squint algorithm for prediction with expert advice. However, in both cases the user still has to provide in advance a Lipschitz hyperparameter that bounds the norm of the gradients. Although this hyperparameter is typically not available in advance, tuning it correctly is crucial: if it is set too small, the methods may fail completely; but if it is taken too large, performance deteriorates significantly. In the present work we remove this Lipschitz hyperparameter by designing new versions of MetaGrad and Squint that adapt to its optimal value automatically. We achieve this by dynamically updating the set of active learning rates. For MetaGrad, we further improve the computational efficiency of handling constraints on the domain of prediction, and we remove the need to specify the number of rounds in advance.Comment: 22 pages. To appear in COLT 201

arXiv.org e-Print Archive

CWI's Institutional Repository

Improving Adaptive Online Learning Using Refined Discretization

Author: Cutkosky Ashok
Paschalidis Ioannis Ch.
Yang Heng
Zhang Zhiyu
Publication venue
Publication date: 22/02/2024
Field of study

We study unconstrained Online Linear Optimization with Lipschitz losses. Motivated by the pursuit of instance optimality, we propose a new algorithm that simultaneously achieves (

i

) the AdaGrad-style second order gradient adaptivity; and (

ii

) the comparator norm adaptivity also known as "parameter freeness" in the literature. In particular, - our algorithm does not employ the impractical doubling trick, and does not require an a priori estimate of the time-uniform Lipschitz constant; - the associated regret bound has the optimal

O(\sqrt{V_T})

dependence on the gradient variance

V_T

, without the typical logarithmic multiplicative factor; - the leading constant in the regret bound is "almost" optimal. Central to these results is a continuous time approach to online learning. We first show that the aimed simultaneous adaptivity can be achieved fairly easily in a continuous time analogue of the problem, where the environment is modeled by an arbitrary continuous semimartingale. Then, our key innovation is a new discretization argument that preserves such adaptivity in the discrete time adversarial setting. This refines a non-gradient-adaptive discretization argument from (Harvey et al., 2023), both algorithmically and analytically, which could be of independent interest.Comment: ALT 202

arXiv.org e-Print Archive

Lipschitz Adaptivity with Multiple Learning Rates in Online Learning

Author: Erven T.A.L. (Tim) van
Koolen-Wijkstra W.M. (Wouter)
Mhammedi Z. (Zakaria)
Publication venue
Publication date: 25/06/2019
Field of study

CWI's Institutional Repository

Unconstrained Dynamic Regret via Sparse Coding

Author: Cutkosky Ashok
Paschalidis Ioannis Ch.
Zhang Zhiyu
Publication venue
Publication date: 27/05/2023
Field of study

Motivated by the challenge of nonstationarity in sequential decision making, we study Online Convex Optimization (OCO) under the coupling of two problem structures: the domain is unbounded, and the comparator sequence

u_1,\ldots,u_T

is arbitrarily time-varying. As no algorithm can guarantee low regret simultaneously against all comparator sequences, handling this setting requires moving from minimax optimality to comparator adaptivity. That is, sensible regret bounds should depend on certain complexity measures of the comparator relative to one's prior knowledge. This paper achieves a new type of these adaptive regret bounds via a sparse coding framework. The complexity of the comparator is measured by its energy and its sparsity on a user-specified dictionary, which offers considerable versatility. Equipped with a wavelet dictionary for example, our framework improves the state-of-the-art bound (Jacobsen & Cutkosky, 2022) by adapting to both (

i

) the magnitude of the comparator average

||\bar u||=||\sum_{t=1}^Tu_t/T||

, rather than the maximum

\max_t||u_t||

; and (

ii

) the comparator variability

\sum_{t=1}^T||u_t-\bar u||

, rather than the uncentered sum

\sum_{t=1}^T||u_t||

. Furthermore, our analysis is simpler due to decoupling function approximation from regret minimization.Comment: Split the two results from the previous version. Expanded the results on Haar wavelets. Improved writin

arXiv.org e-Print Archive

Making SGD Parameter-Free

Author: Carmon Yair
Hinder Oliver
Publication venue
Publication date: 01/03/2024
Field of study

We develop an algorithm for parameter-free stochastic convex optimization (SCO) whose rate of convergence is only a double-logarithmic factor larger than the optimal rate for the corresponding known-parameter setting. In contrast, the best previously known rates for parameter-free SCO are based on online parameter-free regret bounds, which contain unavoidable excess logarithmic terms compared to their known-parameter counterparts. Our algorithm is conceptually simple, has high-probability guarantees, and is also partially adaptive to unknown gradient norms, smoothness, and strong convexity. At the heart of our results is a novel parameter-free certificate for SGD step size choice, and a time-uniform concentration result that assumes no a-priori bounds on SGD iterates

arXiv.org e-Print Archive

MetaGrad: Adaptation using Multiple Learning Rates in Online Learning

Author: Koolen Wouter M.
van der Hoeven Dirk
van Erven Tim
Publication venue
Publication date: 12/02/2021
Field of study

We provide a new adaptive method for online convex optimization, MetaGrad, that is robust to general convex losses but achieves faster rates for a broad class of special functions, including exp-concave and strongly convex functions, but also various types of stochastic and non-stochastic functions without any curvature. We prove this by drawing a connection to the Bernstein condition, which is known to imply fast rates in offline statistical learning. MetaGrad further adapts automatically to the size of the gradients. Its main feature is that it simultaneously considers multiple learning rates, which are weighted directly proportional to their empirical performance on the data using a new meta-algorithm. We provide three versions of MetaGrad. The full matrix version maintains a full covariance matrix and is applicable to learning tasks for which we can afford update time quadratic in the dimension. The other two versions provide speed-ups for high-dimensional learning tasks with an update time that is linear in the dimension: one is based on sketching, the other on running a separate copy of the basic algorithm per coordinate. We evaluate all versions of MetaGrad on benchmark online classification and regression tasks, on which they consistently outperform both online gradient descent and AdaGrad

arXiv.org e-Print Archive

CWI's Institutional Repository

Parameter-free Mirror Descent

Author: Cutkosky Ashok
Jacobsen Andrew
Publication venue
Publication date: 17/09/2022
Field of study

We develop a modified online mirror descent framework that is suitable for building adaptive and parameter-free algorithms in unbounded domains. We leverage this technique to develop the first unconstrained online linear optimization algorithm achieving an optimal dynamic regret bound, and we further demonstrate that natural strategies based on Follow-the-Regularized-Leader are unable to achieve similar results. We also apply our mirror descent framework to build new parameter-free implicit updates, as well as a simplified and improved unconstrained scale-free algorithm.Comment: 52 pages. v3: published at COLT 2022 + fixed typos; v2: improved the algorithms in sections 3, 5, and 6 (tighter regret, simpler updates and analysis), corrected minor technical details and fixed typo

arXiv.org e-Print Archive