Search CORE

173 research outputs found

Corrupt Bandits for Preserving Local Privacy

Author: Gajane Pratik
Kaufmann Emilie
Urvoy Tanguy
Publication venue
Publication date: 02/11/2017
Field of study

We study a variant of the stochastic multi-armed bandit (MAB) problem in which the rewards are corrupted. In this framework, motivated by privacy preservation in online recommender systems, the goal is to maximize the sum of the (unobserved) rewards, based on the observation of transformation of these rewards through a stochastic corruption process with known parameters. We provide a lower bound on the expected regret of any bandit algorithm in this corrupted setting. We devise a frequentist algorithm, KLUCB-CF, and a Bayesian algorithm, TS-CF and give upper bounds on their regret. We also provide the appropriate corruption parameters to guarantee a desired level of local privacy and analyze how this impacts the regret. Finally, we present some experimental results that confirm our analysis

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

HAL Descartes

Hal-Diderot

Asymptotically Optimal Algorithms for Budgeted Multiple Play Bandits

Author: Chambaz Antoine
Kaufmann Emilie
Luedtke Alexander
Publication venue
Publication date: 01/09/2019
Field of study

We study a generalization of the multi-armed bandit problem with multiple plays where there is a cost associated with pulling each arm and the agent has a budget at each time that dictates how much she can expect to spend. We derive an asymptotic regret lower bound for any uniformly efficient algorithm in our setting. We then study a variant of Thompson sampling for Bernoulli rewards and a variant of KL-UCB for both single-parameter exponential families and bounded, finitely supported rewards. We show these algorithms are asymptotically optimal, both in rateand leading problem-dependent constants, including in the thick margin setting where multiple arms fall on the decision boundary

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

HAL Descartes

Hal-Diderot

Optimal Best Arm Identification with Fixed Confidence

Author: Garivier Aurélien
Kaufmann Emilie
Publication venue: HAL CCSD
Publication date: 01/06/2016
Field of study

International audienceWe give a complete characterization of the complexity of best-arm identification in one-parameter bandit problems. We prove a new, tight lower bound on the sample complexity. We propose the `Track-and-Stop' strategy, which we prove to be asymptotically optimal. It consists in a new sampling rule (which tracks the optimal proportions of arm draws highlighted by the lower bound) and in a stopping rule named after Chernoff, for which we give a new analysis

arXiv.org e-Print Archive

Scientific Publications of the University of Toulouse II Le Mirail

INRIA a CCSD electronic archive server

HAL Descartes

HAL-INSA Toulouse

Hal-Diderot

Mixture Martingales Revisited with Applications to Sequential Tests and Confidence Intervals

Author: Kaufmann Emilie
Koolen Wouter
Publication venue
Publication date: 28/11/2018
Field of study

This paper presents new deviation inequalities that are valid uniformly in time under adaptive sampling in a multi-armed bandit model. The deviations are measured using the Kullback-Leibler divergence in a given one-dimensional exponential family, and may take into account several arms at a time. They are obtained by constructing for each arm a mixture martingale based on a hierarchical prior, and by multiplying those martingales. Our deviation inequalities allow us to analyze stopping rules based on generalized likelihood ratios for a large class of sequential identification problems, and to construct tight confidence intervals for some functions of the means of the arms

arXiv.org e-Print Archive

CWI's Institutional Repository

INRIA a CCSD electronic archive server

HAL Descartes

Hal-Diderot

Thompson sampling for one-dimensional exponential family bandits

Author: Kaufmann Emilie
Korda Nathaniel
Munos Rémi
Publication venue: HAL CCSD
Publication date: 01/01/2013
Field of study

International audienceThompson Sampling has been demonstrated in many complex bandit models, however the theoretical guarantees available for the parametric multi-armed bandit are still limited to the Bernoulli case. Here we extend them by proving asymptotic optimality of the algorithm using the Jeffreys prior for 1-dimensional exponential family bandits. Our proof builds on previous work, but also makes extensive use of closed forms for Kullback-Leibler divergence and Fisher information (and thus Jeffreys prior) available in an exponential family. This allow us to give a finite time exponential concentration inequality for posterior distributions on exponential families that may be of interest in its own right. Moreover our analysis covers some distributions for which no optimistic algorithm has yet been proposed, including heavy-tailed exponential families

HAL - Lille 3

INRIA a CCSD electronic archive server

On the Complexity of A/B Testing

Author: Cappé Olivier
Garivier Aurélien
Kaufmann Emilie
Publication venue: HAL CCSD
Publication date: 13/06/2014
Field of study

A/B testing refers to the task of determining the best option among two alternatives that yield random outcomes. We provide distribution-dependent lower bounds for the performance of A/B testing that improve over the results currently available both in the fixed-confidence (or delta-PAC) and fixed-budget settings. When the distribution of the outcomes are Gaussian, we prove that the complexity of the fixed-confidence and fixed-budget settings are equivalent, and that uniform sampling of both alternatives is optimal only in the case of equal variances. In the common variance case, we also provide a stopping rule that terminates faster than existing fixed-confidence algorithms. In the case of Bernoulli distributions, we show that the complexity of fixed-budget setting is smaller than that of fixed-confidence setting and that uniform sampling of both alternatives -though not optimal- is advisable in practice when combined with an appropriate stopping criterion

arXiv.org e-Print Archive

Scientific Publications of the University of Toulouse II Le Mirail

HAL-INSA Toulouse