2,404 research outputs found
Stochastic convex optimization with bandit feedback
This paper addresses the problem of minimizing a convex, Lipschitz function
over a convex, compact set \xset under a stochastic bandit feedback
model. In this model, the algorithm is allowed to observe noisy realizations of
the function value at any query point x \in \xset. The quantity of
interest is the regret of the algorithm, which is the sum of the function
values at algorithm's query points minus the optimal function value. We
demonstrate a generalization of the ellipsoid algorithm that incurs
\otil(\poly(d)\sqrt{T}) regret. Since any algorithm has regret at least
on this problem, our algorithm is optimal in terms of the
scaling with
Federated Online and Bandit Convex Optimization
We study the problems of distributed online and bandit convex optimization
against an adaptive adversary. We aim to minimize the average regret on
machines working in parallel over rounds with intermittent
communications. Assuming the underlying cost functions are convex and can be
generated adaptively, our results show that collaboration is not beneficial
when the machines have access to the first-order gradient information at the
queried points. This is in contrast to the case for stochastic functions, where
each machine samples the cost functions from a fixed distribution. Furthermore,
we delve into the more challenging setting of federated online optimization
with bandit (zeroth-order) feedback, where the machines can only access values
of the cost functions at the queried points. The key finding here is
identifying the high-dimensional regime where collaboration is beneficial and
may even lead to a linear speedup in the number of machines. We further
illustrate our findings through federated adversarial linear bandits by
developing novel distributed single and two-point feedback algorithms. Our work
is the first attempt towards a systematic understanding of federated online
optimization with limited feedback, and it attains tight regret bounds in the
intermittent communication setting for both first and zeroth-order feedback.
Our results thus bridge the gap between stochastic and adaptive settings in
federated online optimization
- …