Search CORE

878 research outputs found

Simple regret for infinitely many armed bandits

Author: Carpentier Alexandra
Valko Michal
Publication venue
Publication date: 18/05/2015
Field of study

We consider a stochastic bandit problem with infinitely many arms. In this setting, the learner has no chance of trying all the arms even once and has to dedicate its limited number of samples only to a certain number of arms. All previous algorithms for this setting were designed for minimizing the cumulative regret of the learner. In this paper, we propose an algorithm aiming at minimizing the simple regret. As in the cumulative regret setting of infinitely many armed bandits, the rate of the simple regret will depend on a parameter

\beta

characterizing the distribution of the near-optimal arms. We prove that depending on

\beta

, our algorithm is minimax optimal either up to a multiplicative constant or up to a

\log(n)

factor. We also provide extensions to several important cases: when

\beta

is unknown, in a natural setting where the near-optimal arms have a small variance, and in the case of unknown time horizon.Comment: in 32th International Conference on Machine Learning (ICML 2015

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

HAL Descartes

Hal-Diderot

A simple dynamic bandit algorithm for hyper-parameter tuning

Author: Kaufmann Emilie
Shang Xuedong
Valko Michal
Publication venue: HAL CCSD
Publication date: 14/06/2019
Field of study

International audienceHyper-parameter tuning is a major part of modern machine learning systems. The tuning itself can be seen as a sequential resource allocation problem. As such, methods for multi-armed bandits have been already applied. In this paper, we view hyper-parameter optimization as an instance of best-arm identification in infinitely many-armed bandits. We propose D-TTTS, a new adaptive algorithm inspired by Thompson sampling, which dynamically balances between refining the estimate of the quality of hyper-parameter configurations previously explored and adding new hyper-parameter configurations to the pool of candidates. The algorithm is easy to implement and shows competitive performance compared to state-of-the-art algorithms for hyper-parameter tuning

INRIA a CCSD electronic archive server

HAL Descartes

Hal-Diderot

Regret Minimisation in Multi-Armed Bandits Using Bounded Arm Memory

Author: Chaudhuri Arghya Roy
Kalyanakrishnan Shivaram
Publication venue
Publication date: 24/01/2019
Field of study

In this paper, we propose a constant word (RAM model) algorithm for regret minimisation for both finite and infinite Stochastic Multi-Armed Bandit (MAB) instances. Most of the existing regret minimisation algorithms need to remember the statistics of all the arms they encounter. This may become a problem for the cases where the number of available words of memory is limited. Designing an efficient regret minimisation algorithm that uses a constant number of words has long been interesting to the community. Some early attempts consider the number of arms to be infinite, and require the reward distribution of the arms to belong to some particular family. Recently, for finitely many-armed bandits an explore-then-commit based algorithm~\citep{Liau+PSY:2018} seems to escape such assumption. However, due to the underlying PAC-based elimination their method incurs a high regret. We present a conceptually simple, and efficient algorithm that needs to remember statistics of at most

M

arms, and for any

K

-armed finite bandit instance it enjoys a

O(KM +K^{1.5}\sqrt{T\log (T/MK)}/M)

upper-bound on regret. We extend it to achieve sub-linear \textit{quantile-regret}~\citep{RoyChaudhuri+K:2018} and empirically verify the efficiency of our algorithm via experiments

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications