Search CORE

2,404 research outputs found

Stochastic convex optimization with bandit feedback

Author: Agarwal Alekh
Foster Dean P.
Hsu Daniel
Kakade Sham M.
Rakhlin Alexander
Publication venue
Publication date: 01/01/1107
Field of study

This paper addresses the problem of minimizing a convex, Lipschitz function

f

over a convex, compact set \xset under a stochastic bandit feedback model. In this model, the algorithm is allowed to observe noisy realizations of the function value

f(x)

at any query point x \in \xset. The quantity of interest is the regret of the algorithm, which is the sum of the function values at algorithm's query points minus the optimal function value. We demonstrate a generalization of the ellipsoid algorithm that incurs \otil(\poly(d)\sqrt{T}) regret. Since any algorithm has regret at least

\Omega(\sqrt{T})

on this problem, our algorithm is optimal in terms of the scaling with

T

arXiv.org e-Print Archive

CiteSeerX

Federated Online and Bandit Convex Optimization

Author: Patel Kumar Kshitij
Saha Aadirupa
Sebro Nati
Wang Lingxiao
Publication venue
Publication date: 29/11/2023
Field of study

We study the problems of distributed online and bandit convex optimization against an adaptive adversary. We aim to minimize the average regret on

M

machines working in parallel over

T

rounds with

R

intermittent communications. Assuming the underlying cost functions are convex and can be generated adaptively, our results show that collaboration is not beneficial when the machines have access to the first-order gradient information at the queried points. This is in contrast to the case for stochastic functions, where each machine samples the cost functions from a fixed distribution. Furthermore, we delve into the more challenging setting of federated online optimization with bandit (zeroth-order) feedback, where the machines can only access values of the cost functions at the queried points. The key finding here is identifying the high-dimensional regime where collaboration is beneficial and may even lead to a linear speedup in the number of machines. We further illustrate our findings through federated adversarial linear bandits by developing novel distributed single and two-point feedback algorithms. Our work is the first attempt towards a systematic understanding of federated online optimization with limited feedback, and it attains tight regret bounds in the intermittent communication setting for both first and zeroth-order feedback. Our results thus bridge the gap between stochastic and adaptive settings in federated online optimization

arXiv.org e-Print Archive