55 research outputs found
Lipschitz and Comparator-Norm Adaptivity in Online Learning
We study Online Convex Optimization in the unbounded setting where neither
predictions nor gradient are constrained. The goal is to simultaneously adapt
to both the sequence of gradients and the comparator. We first develop
parameter-free and scale-free algorithms for a simplified setting with hints.
We present two versions: the first adapts to the squared norms of both
comparator and gradients separately using time per round, the second
adapts to their squared inner products (which measure variance only in the
comparator direction) in time per round. We then generalize two prior
reductions to the unbounded setting; one to not need hints, and a second to
deal with the range ratio problem (which already arises in prior work). We
discuss their optimality in light of prior and new lower bounds. We apply our
methods to obtain sharper regret bounds for scale-invariant online prediction
with linear models.Comment: 30 Pages, 1 Figur
Lipschitz and comparator-norm adaptivity in online learning
We study Online Convex Optimization in the unbounded setting where neither predictions nor gradient are constrained. The goal is to simultaneously adapt to both the sequence of gradients and the comparator. We first develop parameter-free and scale-free algorithms for a simplified setting with hints. We present two versions: the first adapts to the squared norms of both comparator and gradients separately using O(d) time per round, the second adapts to their squared inner products (which measure variance only in the comparator direction) in time O(d3) per round. We then generalize two prior reductions t
Lipschitz and comparator-norm adaptivity in online learning
We study Online Convex Optimization in the unbounded setting where neither predictions nor gradient are constrained. The goal is to simultaneously adapt to both the sequence of gradients and the comparator. We first develop parameter-free and scale-free algorithms for a simplified setting with hints. We present two versions: the first adapts to the squared norms of both comparator and gradients separately using time per round, the second adapts to their squared inner products (which measure variance only in the comparator direction) in time per round. We then generalize two prior reducti
Lipschitz Adaptivity with Multiple Learning Rates in Online Learning
We aim to design adaptive online learning algorithms that take advantage of
any special structure that might be present in the learning task at hand, with
as little manual tuning by the user as possible. A fundamental obstacle that
comes up in the design of such adaptive algorithms is to calibrate a so-called
step-size or learning rate hyperparameter depending on variance, gradient
norms, etc. A recent technique promises to overcome this difficulty by
maintaining multiple learning rates in parallel. This technique has been
applied in the MetaGrad algorithm for online convex optimization and the Squint
algorithm for prediction with expert advice. However, in both cases the user
still has to provide in advance a Lipschitz hyperparameter that bounds the norm
of the gradients. Although this hyperparameter is typically not available in
advance, tuning it correctly is crucial: if it is set too small, the methods
may fail completely; but if it is taken too large, performance deteriorates
significantly. In the present work we remove this Lipschitz hyperparameter by
designing new versions of MetaGrad and Squint that adapt to its optimal value
automatically. We achieve this by dynamically updating the set of active
learning rates. For MetaGrad, we further improve the computational efficiency
of handling constraints on the domain of prediction, and we remove the need to
specify the number of rounds in advance.Comment: 22 pages. To appear in COLT 201
Improving Adaptive Online Learning Using Refined Discretization
We study unconstrained Online Linear Optimization with Lipschitz losses.
Motivated by the pursuit of instance optimality, we propose a new algorithm
that simultaneously achieves () the AdaGrad-style second order gradient
adaptivity; and () the comparator norm adaptivity also known as "parameter
freeness" in the literature. In particular,
- our algorithm does not employ the impractical doubling trick, and does not
require an a priori estimate of the time-uniform Lipschitz constant;
- the associated regret bound has the optimal dependence on
the gradient variance , without the typical logarithmic multiplicative
factor;
- the leading constant in the regret bound is "almost" optimal.
Central to these results is a continuous time approach to online learning. We
first show that the aimed simultaneous adaptivity can be achieved fairly easily
in a continuous time analogue of the problem, where the environment is modeled
by an arbitrary continuous semimartingale. Then, our key innovation is a new
discretization argument that preserves such adaptivity in the discrete time
adversarial setting. This refines a non-gradient-adaptive discretization
argument from (Harvey et al., 2023), both algorithmically and analytically,
which could be of independent interest.Comment: ALT 202
Lipschitz Adaptivity with Multiple Learning Rates in Online Learning
We aim to design adaptive online learning algorithms that take advantage of any special structure
that might be present in the learning task at hand, with as little manual tuning by the user as possible.
A fundamental obstacle that comes up in the design of such adaptive algorithms is to calibrate
a so-called step-size or learning rate hyperparameter depending on variance, gradient norms, etc.
A recent technique promises to overcome this difficulty by maintaining multiple learning rates in
parallel. This technique has been applied in the MetaGrad algorithm for online convex optimization
and the Squint algorithm for prediction with expert advice. However, in both cases the user still has
to provide in advance a Lipschitz hyperparameter that bounds the norm of the gradients. Although
this hyperparameter is typically not available in advance, tuning it correctly is crucial: if it is set
too small, the methods may fail completely; but if it is taken too large, performance deteriorates
significantly. In the present work we remove this Lipschitz hyperparameter by designing new
versions of MetaGrad and Squint that adapt to its optimal value automatically. We achieve this
by dynamically updating the set of active learning rates. For MetaGrad, we further improve the
computational efficiency of handling constraints on the domain of prediction, and we remove the
need to specify the number of rounds in advance
Unconstrained Dynamic Regret via Sparse Coding
Motivated by the challenge of nonstationarity in sequential decision making,
we study Online Convex Optimization (OCO) under the coupling of two problem
structures: the domain is unbounded, and the comparator sequence
is arbitrarily time-varying. As no algorithm can guarantee low
regret simultaneously against all comparator sequences, handling this setting
requires moving from minimax optimality to comparator adaptivity. That is,
sensible regret bounds should depend on certain complexity measures of the
comparator relative to one's prior knowledge.
This paper achieves a new type of these adaptive regret bounds via a sparse
coding framework. The complexity of the comparator is measured by its energy
and its sparsity on a user-specified dictionary, which offers considerable
versatility. Equipped with a wavelet dictionary for example, our framework
improves the state-of-the-art bound (Jacobsen & Cutkosky, 2022) by adapting to
both () the magnitude of the comparator average , rather than the maximum ; and ()
the comparator variability , rather than the
uncentered sum . Furthermore, our analysis is simpler due
to decoupling function approximation from regret minimization.Comment: Split the two results from the previous version. Expanded the results
on Haar wavelets. Improved writin
Making SGD Parameter-Free
We develop an algorithm for parameter-free stochastic convex optimization
(SCO) whose rate of convergence is only a double-logarithmic factor larger than
the optimal rate for the corresponding known-parameter setting. In contrast,
the best previously known rates for parameter-free SCO are based on online
parameter-free regret bounds, which contain unavoidable excess logarithmic
terms compared to their known-parameter counterparts. Our algorithm is
conceptually simple, has high-probability guarantees, and is also partially
adaptive to unknown gradient norms, smoothness, and strong convexity. At the
heart of our results is a novel parameter-free certificate for SGD step size
choice, and a time-uniform concentration result that assumes no a-priori bounds
on SGD iterates
MetaGrad: Adaptation using Multiple Learning Rates in Online Learning
We provide a new adaptive method for online convex optimization, MetaGrad,
that is robust to general convex losses but achieves faster rates for a broad
class of special functions, including exp-concave and strongly convex
functions, but also various types of stochastic and non-stochastic functions
without any curvature. We prove this by drawing a connection to the Bernstein
condition, which is known to imply fast rates in offline statistical learning.
MetaGrad further adapts automatically to the size of the gradients. Its main
feature is that it simultaneously considers multiple learning rates, which are
weighted directly proportional to their empirical performance on the data using
a new meta-algorithm. We provide three versions of MetaGrad. The full matrix
version maintains a full covariance matrix and is applicable to learning tasks
for which we can afford update time quadratic in the dimension. The other two
versions provide speed-ups for high-dimensional learning tasks with an update
time that is linear in the dimension: one is based on sketching, the other on
running a separate copy of the basic algorithm per coordinate. We evaluate all
versions of MetaGrad on benchmark online classification and regression tasks,
on which they consistently outperform both online gradient descent and AdaGrad
Parameter-free Mirror Descent
We develop a modified online mirror descent framework that is suitable for
building adaptive and parameter-free algorithms in unbounded domains. We
leverage this technique to develop the first unconstrained online linear
optimization algorithm achieving an optimal dynamic regret bound, and we
further demonstrate that natural strategies based on
Follow-the-Regularized-Leader are unable to achieve similar results. We also
apply our mirror descent framework to build new parameter-free implicit
updates, as well as a simplified and improved unconstrained scale-free
algorithm.Comment: 52 pages. v3: published at COLT 2022 + fixed typos; v2: improved the
algorithms in sections 3, 5, and 6 (tighter regret, simpler updates and
analysis), corrected minor technical details and fixed typo
- β¦