2 research outputs found
A Note on Information-Directed Sampling and Thompson Sampling
This note introduce three Bayesian style Multi-armed bandit algorithms:
Information-directed sampling, Thompson Sampling and Generalized Thompson
Sampling. The goal is to give an intuitive explanation for these three
algorithms and their regret bounds, and provide some derivations that are
omitted in the original papers
Taming Non-stationary Bandits: A Bayesian Approach
We consider the multi armed bandit problem in non-stationary environments.
Based on the Bayesian method, we propose a variant of Thompson Sampling which
can be used in both rested and restless bandit scenarios. Applying discounting
to the parameters of prior distribution, we describe a way to systematically
reduce the effect of past observations. Further, we derive the exact expression
for the probability of picking sub-optimal arms. By increasing the exploitative
value of Bayes' samples, we also provide an optimistic version of the
algorithm. Extensive empirical analysis is conducted under various scenarios to
validate the utility of proposed algorithms. A comparison study with various
state-of-the-arm algorithms is also included.Comment: Submitted to NIPS 201