783 research outputs found

    Exploration vs Exploitation vs Safety: Risk-averse Multi-Armed Bandits

    Get PDF
    Motivated by applications in energy management, this paper presents the Multi-Armed Risk-Aware Bandit (MARAB) algorithm. With the goal of limiting the exploration of risky arms, MARAB takes as arm quality its conditional value at risk. When the user-supplied risk level goes to 0, the arm quality tends toward the essential infimum of the arm distribution density, and MARAB tends toward the MIN multi-armed bandit algorithm, aimed at the arm with maximal minimal value. As a first contribution, this paper presents a theoretical analysis of the MIN algorithm under mild assumptions, establishing its robustness comparatively to UCB. The analysis is supported by extensive experimental validation of MIN and MARAB compared to UCB and state-of-art risk-aware MAB algorithms on artificial and real-world problems.Comment: 16 page

    Lower Bounds on Regret for Noisy Gaussian Process Bandit Optimization

    Get PDF
    In this paper, we consider the problem of sequentially optimizing a black-box function ff based on noisy samples and bandit feedback. We assume that ff is smooth in the sense of having a bounded norm in some reproducing kernel Hilbert space (RKHS), yielding a commonly-considered non-Bayesian form of Gaussian process bandit optimization. We provide algorithm-independent lower bounds on the simple regret, measuring the suboptimality of a single point reported after TT rounds, and on the cumulative regret, measuring the sum of regrets over the TT chosen points. For the isotropic squared-exponential kernel in dd dimensions, we find that an average simple regret of ϵ\epsilon requires T=Ω(1ϵ2(log1ϵ)d/2)T = \Omega\big(\frac{1}{\epsilon^2} (\log\frac{1}{\epsilon})^{d/2}\big), and the average cumulative regret is at least Ω(T(logT)d/2)\Omega\big( \sqrt{T(\log T)^{d/2}} \big), thus matching existing upper bounds up to the replacement of d/2d/2 by 2d+O(1)2d+O(1) in both cases. For the Mat\'ern-ν\nu kernel, we give analogous bounds of the form Ω((1ϵ)2+d/ν)\Omega\big( (\frac{1}{\epsilon})^{2+d/\nu}\big) and Ω(Tν+d2ν+d)\Omega\big( T^{\frac{\nu + d}{2\nu + d}} \big), and discuss the resulting gaps to the existing upper bounds.Comment: Appearing in COLT 2017. This version corrects a few minor mistakes in Table I, which summarizes the new and existing regret bound
    corecore