Search CORE

2,296 research outputs found

Stochastic convex optimization with bandit feedback

Author: Agarwal Alekh
Foster Dean P.
Hsu Daniel
Kakade Sham M.
Rakhlin Alexander
Publication venue
Publication date: 01/01/1107
Field of study

This paper addresses the problem of minimizing a convex, Lipschitz function

f

over a convex, compact set \xset under a stochastic bandit feedback model. In this model, the algorithm is allowed to observe noisy realizations of the function value

f(x)

at any query point x \in \xset. The quantity of interest is the regret of the algorithm, which is the sum of the function values at algorithm's query points minus the optimal function value. We demonstrate a generalization of the ellipsoid algorithm that incurs \otil(\poly(d)\sqrt{T}) regret. Since any algorithm has regret at least

\Omega(\sqrt{T})

on this problem, our algorithm is optimal in terms of the scaling with

T

arXiv.org e-Print Archive

CiteSeerX