1 research outputs found
Stochastic continuum armed bandit problem of few linear parameters in high dimensions
We consider a stochastic continuum armed bandit problem where the arms are
indexed by the ball of radius in
. The reward functions
are considered to intrinsically depend on unknown linear parameters
so that where is a full
rank matrix. Assuming the mean reward function to be smooth we
make use of results from low-rank matrix recovery literature and derive an
efficient randomized algorithm which achieves a regret bound of with high probability. Here
is at most polynomial in and and is the number of rounds
or the sampling budget which is assumed to be known beforehand.Comment: Changes from previous version: (a) Corrected typos throughout. (b) In
earlier version, regret was defined as a conditional expectation (and hence
bounded w.h.p); this is changed to an expectation now resulting in minor
changes in statements of Lemma 1, Theorems 1,2 and Corollary 1. See Remark 1.
(c) Added Remark 3, and corrected statement of Proposition