175 research outputs found
Corrupt Bandits for Preserving Local Privacy
We study a variant of the stochastic multi-armed bandit (MAB) problem in
which the rewards are corrupted. In this framework, motivated by privacy
preservation in online recommender systems, the goal is to maximize the sum of
the (unobserved) rewards, based on the observation of transformation of these
rewards through a stochastic corruption process with known parameters. We
provide a lower bound on the expected regret of any bandit algorithm in this
corrupted setting. We devise a frequentist algorithm, KLUCB-CF, and a Bayesian
algorithm, TS-CF and give upper bounds on their regret. We also provide the
appropriate corruption parameters to guarantee a desired level of local privacy
and analyze how this impacts the regret. Finally, we present some experimental
results that confirm our analysis
Asymptotically Optimal Algorithms for Budgeted Multiple Play Bandits
We study a generalization of the multi-armed bandit problem with multiple
plays where there is a cost associated with pulling each arm and the agent has
a budget at each time that dictates how much she can expect to spend. We derive
an asymptotic regret lower bound for any uniformly efficient algorithm in our
setting. We then study a variant of Thompson sampling for Bernoulli rewards and
a variant of KL-UCB for both single-parameter exponential families and bounded,
finitely supported rewards. We show these algorithms are asymptotically
optimal, both in rateand leading problem-dependent constants, including in the
thick margin setting where multiple arms fall on the decision boundary
Optimal Best Arm Identification with Fixed Confidence
International audienceWe give a complete characterization of the complexity of best-arm identification in one-parameter bandit problems. We prove a new, tight lower bound on the sample complexity. We propose the `Track-and-Stop' strategy, which we prove to be asymptotically optimal. It consists in a new sampling rule (which tracks the optimal proportions of arm draws highlighted by the lower bound) and in a stopping rule named after Chernoff, for which we give a new analysis
Mixture Martingales Revisited with Applications to Sequential Tests and Confidence Intervals
This paper presents new deviation inequalities that are valid uniformly in
time under adaptive sampling in a multi-armed bandit model. The deviations are
measured using the Kullback-Leibler divergence in a given one-dimensional
exponential family, and may take into account several arms at a time. They are
obtained by constructing for each arm a mixture martingale based on a
hierarchical prior, and by multiplying those martingales. Our deviation
inequalities allow us to analyze stopping rules based on generalized likelihood
ratios for a large class of sequential identification problems, and to
construct tight confidence intervals for some functions of the means of the
arms
Thompson sampling for one-dimensional exponential family bandits
International audienceThompson Sampling has been demonstrated in many complex bandit models, however the theoretical guarantees available for the parametric multi-armed bandit are still limited to the Bernoulli case. Here we extend them by proving asymptotic optimality of the algorithm using the Jeffreys prior for 1-dimensional exponential family bandits. Our proof builds on previous work, but also makes extensive use of closed forms for Kullback-Leibler divergence and Fisher information (and thus Jeffreys prior) available in an exponential family. This allow us to give a finite time exponential concentration inequality for posterior distributions on exponential families that may be of interest in its own right. Moreover our analysis covers some distributions for which no optimistic algorithm has yet been proposed, including heavy-tailed exponential families
On the Complexity of A/B Testing
A/B testing refers to the task of determining the best option among two alternatives that yield random outcomes. We provide distribution-dependent lower bounds for the performance of A/B testing that improve over the results currently available both in the fixed-confidence (or delta-PAC) and fixed-budget settings. When the distribution of the outcomes are Gaussian, we prove that the complexity of the fixed-confidence and fixed-budget settings are equivalent, and that uniform sampling of both alternatives is optimal only in the case of equal variances. In the common variance case, we also provide a stopping rule that terminates faster than existing fixed-confidence algorithms. In the case of Bernoulli distributions, we show that the complexity of fixed-budget setting is smaller than that of fixed-confidence setting and that uniform sampling of both alternatives -though not optimal- is advisable in practice when combined with an appropriate stopping criterion
- …