Search CORE

783 research outputs found

Exploration vs Exploitation vs Safety: Risk-averse Multi-Armed Bandits

Author: Galichet Nicolas
Sebag Michèle
Teytaud Olivier
Publication venue
Publication date: 13/11/2013
Field of study

Motivated by applications in energy management, this paper presents the Multi-Armed Risk-Aware Bandit (MARAB) algorithm. With the goal of limiting the exploration of risky arms, MARAB takes as arm quality its conditional value at risk. When the user-supplied risk level goes to 0, the arm quality tends toward the essential infimum of the arm distribution density, and MARAB tends toward the MIN multi-armed bandit algorithm, aimed at the arm with maximal minimal value. As a first contribution, this paper presents a theoretical analysis of the MIN algorithm under mild assumptions, establishing its robustness comparatively to UCB. The analysis is supported by extensive experimental validation of MIN and MARAB compared to UCB and state-of-art risk-aware MAB algorithms on artificial and real-world problems.Comment: 16 page

arXiv.org e-Print Archive

HAL-CentraleSupelec

INRIA a CCSD electronic archive server

Lower Bounds on Regret for Noisy Gaussian Process Bandit Optimization

Author: Bogunovic Ilijia
Cevher Volkan
Scarlett Jonathan
Publication venue
Publication date: 31/05/2017
Field of study

In this paper, we consider the problem of sequentially optimizing a black-box function

f

based on noisy samples and bandit feedback. We assume that

f

is smooth in the sense of having a bounded norm in some reproducing kernel Hilbert space (RKHS), yielding a commonly-considered non-Bayesian form of Gaussian process bandit optimization. We provide algorithm-independent lower bounds on the simple regret, measuring the suboptimality of a single point reported after

T

rounds, and on the cumulative regret, measuring the sum of regrets over the

T

chosen points. For the isotropic squared-exponential kernel in

d

dimensions, we find that an average simple regret of

\epsilon

requires

T = \Omega\big(\frac{1}{\epsilon^2} (\log\frac{1}{\epsilon})^{d/2}\big)

, and the average cumulative regret is at least

\Omega\big( \sqrt{T(\log T)^{d/2}} \big)

, thus matching existing upper bounds up to the replacement of

d/2

2d+O(1)

in both cases. For the Mat\'ern-

\nu

kernel, we give analogous bounds of the form

\Omega\big( (\frac{1}{\epsilon})^{2+d/\nu}\big)

and

\Omega\big( T^{\frac{\nu + d}{2\nu + d}} \big)

, and discuss the resulting gaps to the existing upper bounds.Comment: Appearing in COLT 2017. This version corrects a few minor mistakes in Table I, which summarizes the new and existing regret bound

arXiv.org e-Print Archive