878 research outputs found

    Simple regret for infinitely many armed bandits

    Get PDF
    We consider a stochastic bandit problem with infinitely many arms. In this setting, the learner has no chance of trying all the arms even once and has to dedicate its limited number of samples only to a certain number of arms. All previous algorithms for this setting were designed for minimizing the cumulative regret of the learner. In this paper, we propose an algorithm aiming at minimizing the simple regret. As in the cumulative regret setting of infinitely many armed bandits, the rate of the simple regret will depend on a parameter β\beta characterizing the distribution of the near-optimal arms. We prove that depending on β\beta, our algorithm is minimax optimal either up to a multiplicative constant or up to a log(n)\log(n) factor. We also provide extensions to several important cases: when β\beta is unknown, in a natural setting where the near-optimal arms have a small variance, and in the case of unknown time horizon.Comment: in 32th International Conference on Machine Learning (ICML 2015

    A simple dynamic bandit algorithm for hyper-parameter tuning

    Get PDF
    International audienceHyper-parameter tuning is a major part of modern machine learning systems. The tuning itself can be seen as a sequential resource allocation problem. As such, methods for multi-armed bandits have been already applied. In this paper, we view hyper-parameter optimization as an instance of best-arm identification in infinitely many-armed bandits. We propose D-TTTS, a new adaptive algorithm inspired by Thompson sampling, which dynamically balances between refining the estimate of the quality of hyper-parameter configurations previously explored and adding new hyper-parameter configurations to the pool of candidates. The algorithm is easy to implement and shows competitive performance compared to state-of-the-art algorithms for hyper-parameter tuning

    Regret Minimisation in Multi-Armed Bandits Using Bounded Arm Memory

    Full text link
    In this paper, we propose a constant word (RAM model) algorithm for regret minimisation for both finite and infinite Stochastic Multi-Armed Bandit (MAB) instances. Most of the existing regret minimisation algorithms need to remember the statistics of all the arms they encounter. This may become a problem for the cases where the number of available words of memory is limited. Designing an efficient regret minimisation algorithm that uses a constant number of words has long been interesting to the community. Some early attempts consider the number of arms to be infinite, and require the reward distribution of the arms to belong to some particular family. Recently, for finitely many-armed bandits an explore-then-commit based algorithm~\citep{Liau+PSY:2018} seems to escape such assumption. However, due to the underlying PAC-based elimination their method incurs a high regret. We present a conceptually simple, and efficient algorithm that needs to remember statistics of at most MM arms, and for any KK-armed finite bandit instance it enjoys a O(KM+K1.5Tlog(T/MK)/M)O(KM +K^{1.5}\sqrt{T\log (T/MK)}/M) upper-bound on regret. We extend it to achieve sub-linear \textit{quantile-regret}~\citep{RoyChaudhuri+K:2018} and empirically verify the efficiency of our algorithm via experiments
    corecore