Search CORE

887 research outputs found

Thompson Sampling for Bandits with Clustered Arms

Author: Carlsson Emil
Dubhashi Devdatt
Johansson Fredrik
Publication venue: 'International Joint Conferences on Artificial Intelligence'
Publication date: 01/01/2021
Field of study

We propose algorithms based on a multi-level Thompson sampling scheme, for the stochastic multi-armed bandit and its contextual variant with linear expected rewards, in the setting where arms are clustered. We show, both theoretically and empirically, how exploiting a given cluster structure can significantly improve the regret and computational cost compared to using standard Thompson sampling. In the case of the stochastic multi-armed bandit we give upper bounds on the expected cumulative regret showing how it depends on the quality of the clustering. Finally, we perform an empirical evaluation showing that our algorithms perform well compared to previously proposed algorithms for bandits with clustered arms

Chalmers Research

Thompson Sampling: An Asymptotically Optimal Finite Time Analysis

Author: A. Salomon
B.C. May
J.-Y. Audibert
J.-Y. Audibert
O.C. Granmo
P. Auer
T.L. Lai
W.R. Thompson
Publication venue
Publication date: 01/01/2012
Field of study

The question of the optimality of Thompson Sampling for solving the stochastic multi-armed bandit problem had been open since 1933. In this paper we answer it positively for the case of Bernoulli rewards by providing the first finite-time analysis that matches the asymptotic rate given in the Lai and Robbins lower bound for the cumulative regret. The proof is accompanied by a numerical comparison with other optimal policies, experiments that have been lacking in the literature until now for the Bernoulli case.Comment: 15 pages, 2 figures, submitted to ALT (Algorithmic Learning Theory

arXiv.org e-Print Archive

HAL - Lille 3

Crossref

INRIA a CCSD electronic archive server

Bounded regret in stochastic multi-armed bandits

Author: Bubeck Sébastien
Perchet Vianney
Rigollet Philippe
Publication venue
Publication date: 01/02/2013
Field of study

We study the stochastic multi-armed bandit problem when one knows the value

\mu^{(\star)}

of an optimal arm, as a well as a positive lower bound on the smallest positive gap

\Delta

. We propose a new randomized policy that attains a regret {\em uniformly bounded over time} in this setting. We also prove several lower bounds, which show in particular that bounded regret is not possible if one only knows

\Delta

, and bounded regret of order

1/\Delta

is not possible if one only knows $\mu^{(\star)}

arXiv.org e-Print Archive

Princeton University Open Access Repository

Achieving Fairness in the Stochastic Multi-armed Bandit Problem

Author: Ghalme Ganesh
Nair Vineet
Narahari Y.
Patil Vishakha
Publication venue
Publication date: 05/02/2020
Field of study

We study an interesting variant of the stochastic multi-armed bandit problem, called the Fair-SMAB problem, where each arm is required to be pulled for at least a given fraction of the total available rounds. We investigate the interplay between learning and fairness in terms of a pre-specified vector denoting the fractions of guaranteed pulls. We define a fairness-aware regret, called

r

-Regret, that takes into account the above fairness constraints and naturally extends the conventional notion of regret. Our primary contribution is characterizing a class of Fair-SMAB algorithms by two parameters: the unfairness tolerance and the learning algorithm used as a black-box. We provide a fairness guarantee for this class that holds uniformly over time irrespective of the choice of the learning algorithm. In particular, when the learning algorithm is UCB1, we show that our algorithm achieves

O(\ln T)

r

-Regret. Finally, we evaluate the cost of fairness in terms of the conventional notion of regret.Comment: arXiv admin note: substantial text overlap with arXiv:1905.1126

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

Bandits with heavy tail

Author: Bubeck Sébastien
Cesa-Bianchi Nicolò
Lugosi Gábor
Publication venue
Publication date: 01/01/2012
Field of study

The stochastic multi-armed bandit problem is well understood when the reward distributions are sub-Gaussian. In this paper we examine the bandit problem under the weaker assumption that the distributions have moments of order 1+\epsilon, for some

\epsilon \in (0,1]

. Surprisingly, moments of order 2 (i.e., finite variance) are sufficient to obtain regret bounds of the same order as under sub-Gaussian reward distributions. In order to achieve such regret, we define sampling strategies based on refined estimators of the mean such as the truncated empirical mean, Catoni's M-estimator, and the median-of-means estimator. We also derive matching lower bounds that also show that the best achievable regret deteriorates when \epsilon <1

arXiv.org e-Print Archive

CiteSeerX

AIR Universita degli studi di Milano