5,903 research outputs found
Bandits with heavy tail
The stochastic multi-armed bandit problem is well understood when the reward
distributions are sub-Gaussian. In this paper we examine the bandit problem
under the weaker assumption that the distributions have moments of order
1+\epsilon, for some . Surprisingly, moments of order 2
(i.e., finite variance) are sufficient to obtain regret bounds of the same
order as under sub-Gaussian reward distributions. In order to achieve such
regret, we define sampling strategies based on refined estimators of the mean
such as the truncated empirical mean, Catoni's M-estimator, and the
median-of-means estimator. We also derive matching lower bounds that also show
that the best achievable regret deteriorates when \epsilon <1
Simple regret for infinitely many armed bandits
We consider a stochastic bandit problem with infinitely many arms. In this
setting, the learner has no chance of trying all the arms even once and has to
dedicate its limited number of samples only to a certain number of arms. All
previous algorithms for this setting were designed for minimizing the
cumulative regret of the learner. In this paper, we propose an algorithm aiming
at minimizing the simple regret. As in the cumulative regret setting of
infinitely many armed bandits, the rate of the simple regret will depend on a
parameter characterizing the distribution of the near-optimal arms. We
prove that depending on , our algorithm is minimax optimal either up to
a multiplicative constant or up to a factor. We also provide
extensions to several important cases: when is unknown, in a natural
setting where the near-optimal arms have a small variance, and in the case of
unknown time horizon.Comment: in 32th International Conference on Machine Learning (ICML 2015
Finding a most biased coin with fewest flips
We study the problem of learning a most biased coin among a set of coins by
tossing the coins adaptively. The goal is to minimize the number of tosses
until we identify a coin i* whose posterior probability of being most biased is
at least 1-delta for a given delta. Under a particular probabilistic model, we
give an optimal algorithm, i.e., an algorithm that minimizes the expected
number of future tosses. The problem is closely related to finding the best arm
in the multi-armed bandit problem using adaptive strategies. Our algorithm
employs an optimal adaptive strategy -- a strategy that performs the best
possible action at each step after observing the outcomes of all previous coin
tosses. Consequently, our algorithm is also optimal for any starting history of
outcomes. To our knowledge, this is the first algorithm that employs an optimal
adaptive strategy under a Bayesian setting for this problem. Our proof of
optimality employs tools from the field of Markov games
- …