913 research outputs found
Randomized Exploration in Generalized Linear Bandits
We study two randomized algorithms for generalized linear bandits, GLM-TSL
and GLM-FPL. GLM-TSL samples a generalized linear model (GLM) from the Laplace
approximation to the posterior distribution. GLM-FPL fits a GLM to a randomly
perturbed history of past rewards. We prove
bounds on the -round regret of GLM-TSL and GLM-FPL, where is the number
of features and is the number of arms. The regret bound of GLM-TSL improves
upon prior work and the regret bound of GLM-FPL is the first of its kind. We
apply both GLM-TSL and GLM-FPL to logistic and neural network bandits, and show
that they perform well empirically. In more complex models, GLM-FPL is
significantly faster. Our results showcase the role of randomization, beyond
sampling from the posterior, in exploration
Perturbed-History Exploration in Stochastic Linear Bandits
We propose a new online algorithm for minimizing the cumulative regret in
stochastic linear bandits. The key idea is to build a perturbed history, which
mixes the history of observed rewards with a pseudo-history of randomly
generated i.i.d. pseudo-rewards. Our algorithm, perturbed-history exploration
in a linear bandit (LinPHE), estimates a linear model from its perturbed
history and pulls the arm with the highest value under that model. We prove a
gap-free bound on the expected -round regret of
LinPHE, where is the number of features. Our analysis relies on novel
concentration and anti-concentration bounds on the weighted sum of Bernoulli
random variables. To show the generality of our design, we extend LinPHE to a
logistic reward model. We evaluate both algorithms empirically and show that
they are practical
- …