Motivated by practical applications, chiefly clinical trials, we study the
regret achievable for stochastic bandits under the constraint that the employed
policy must split trials into a small number of batches. We propose a simple
policy, and show that a very small number of batches gives close to minimax
optimal regret bounds. As a byproduct, we derive optimal policies with low
switching cost for stochastic bandits.Comment: Published at http://dx.doi.org/10.1214/15-AOS1381 in the Annals of
Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical
Statistics (http://www.imstat.org