878 research outputs found
Simple regret for infinitely many armed bandits
We consider a stochastic bandit problem with infinitely many arms. In this
setting, the learner has no chance of trying all the arms even once and has to
dedicate its limited number of samples only to a certain number of arms. All
previous algorithms for this setting were designed for minimizing the
cumulative regret of the learner. In this paper, we propose an algorithm aiming
at minimizing the simple regret. As in the cumulative regret setting of
infinitely many armed bandits, the rate of the simple regret will depend on a
parameter characterizing the distribution of the near-optimal arms. We
prove that depending on , our algorithm is minimax optimal either up to
a multiplicative constant or up to a factor. We also provide
extensions to several important cases: when is unknown, in a natural
setting where the near-optimal arms have a small variance, and in the case of
unknown time horizon.Comment: in 32th International Conference on Machine Learning (ICML 2015
A simple dynamic bandit algorithm for hyper-parameter tuning
International audienceHyper-parameter tuning is a major part of modern machine learning systems. The tuning itself can be seen as a sequential resource allocation problem. As such, methods for multi-armed bandits have been already applied. In this paper, we view hyper-parameter optimization as an instance of best-arm identification in infinitely many-armed bandits. We propose D-TTTS, a new adaptive algorithm inspired by Thompson sampling, which dynamically balances between refining the estimate of the quality of hyper-parameter configurations previously explored and adding new hyper-parameter configurations to the pool of candidates. The algorithm is easy to implement and shows competitive performance compared to state-of-the-art algorithms for hyper-parameter tuning
Regret Minimisation in Multi-Armed Bandits Using Bounded Arm Memory
In this paper, we propose a constant word (RAM model) algorithm for regret
minimisation for both finite and infinite Stochastic Multi-Armed Bandit (MAB)
instances. Most of the existing regret minimisation algorithms need to remember
the statistics of all the arms they encounter. This may become a problem for
the cases where the number of available words of memory is limited. Designing
an efficient regret minimisation algorithm that uses a constant number of words
has long been interesting to the community. Some early attempts consider the
number of arms to be infinite, and require the reward distribution of the arms
to belong to some particular family. Recently, for finitely many-armed bandits
an explore-then-commit based algorithm~\citep{Liau+PSY:2018} seems to escape
such assumption. However, due to the underlying PAC-based elimination their
method incurs a high regret. We present a conceptually simple, and efficient
algorithm that needs to remember statistics of at most arms, and for any
-armed finite bandit instance it enjoys a upper-bound on regret. We extend it to achieve sub-linear
\textit{quantile-regret}~\citep{RoyChaudhuri+K:2018} and empirically verify the
efficiency of our algorithm via experiments
- …