Search CORE

5,901 research outputs found

Bandits with heavy tail

Author: Bubeck Sébastien
Cesa-Bianchi Nicolò
Lugosi Gábor
Publication venue
Publication date: 01/01/2012
Field of study

The stochastic multi-armed bandit problem is well understood when the reward distributions are sub-Gaussian. In this paper we examine the bandit problem under the weaker assumption that the distributions have moments of order 1+\epsilon, for some

\epsilon \in (0,1]

. Surprisingly, moments of order 2 (i.e., finite variance) are sufficient to obtain regret bounds of the same order as under sub-Gaussian reward distributions. In order to achieve such regret, we define sampling strategies based on refined estimators of the mean such as the truncated empirical mean, Catoni's M-estimator, and the median-of-means estimator. We also derive matching lower bounds that also show that the best achievable regret deteriorates when \epsilon <1

arXiv.org e-Print Archive

CiteSeerX

AIR Universita degli studi di Milano

Simple regret for infinitely many armed bandits

Author: Carpentier Alexandra
Valko Michal
Publication venue
Publication date: 18/05/2015
Field of study

We consider a stochastic bandit problem with infinitely many arms. In this setting, the learner has no chance of trying all the arms even once and has to dedicate its limited number of samples only to a certain number of arms. All previous algorithms for this setting were designed for minimizing the cumulative regret of the learner. In this paper, we propose an algorithm aiming at minimizing the simple regret. As in the cumulative regret setting of infinitely many armed bandits, the rate of the simple regret will depend on a parameter

\beta

characterizing the distribution of the near-optimal arms. We prove that depending on

\beta

, our algorithm is minimax optimal either up to a multiplicative constant or up to a

\log(n)

factor. We also provide extensions to several important cases: when

\beta

is unknown, in a natural setting where the near-optimal arms have a small variance, and in the case of unknown time horizon.Comment: in 32th International Conference on Machine Learning (ICML 2015

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

HAL Descartes

Hal-Diderot

Finding a most biased coin with fewest flips

Author: Chandrasekaran Karthekeyan
Karp Richard
Publication venue
Publication date: 07/09/2013
Field of study

We study the problem of learning a most biased coin among a set of coins by tossing the coins adaptively. The goal is to minimize the number of tosses until we identify a coin i* whose posterior probability of being most biased is at least 1-delta for a given delta. Under a particular probabilistic model, we give an optimal algorithm, i.e., an algorithm that minimizes the expected number of future tosses. The problem is closely related to finding the best arm in the multi-armed bandit problem using adaptive strategies. Our algorithm employs an optimal adaptive strategy -- a strategy that performs the best possible action at each step after observing the outcomes of all previous coin tosses. Consequently, our algorithm is also optimal for any starting history of outcomes. To our knowledge, this is the first algorithm that employs an optimal adaptive strategy under a Bayesian setting for this problem. Our proof of optimality employs tools from the field of Markov games

arXiv.org e-Print Archive

CiteSeerX