Search CORE

80 research outputs found

Algorithms for Differentially Private Multi-Armed Bandits

Author: Dimitrakakis Christos
Tossou Aristide
Publication venue
Publication date: 27/11/2015
Field of study

We present differentially private algorithms for the stochastic Multi-Armed Bandit (MAB) problem. This is a problem for applications such as adaptive clinical trials, experiment design, and user-targeted advertising where private information is connected to individual rewards. Our major contribution is to show that there exist

(\epsilon, \delta)

differentially private variants of Upper Confidence Bound algorithms which have optimal regret,

O(\epsilon^{-1} + \log T)

. This is a significant improvement over previous results, which only achieve poly-log regret

O(\epsilon^{-2} \log^{2} T)

, because of our use of a novel interval-based mechanism. We also substantially improve the bounds of previous family of algorithms which use a continual release mechanism. Experiments clearly validate our theoretical bounds

arXiv.org e-Print Archive

HAL - Lille 3

INRIA a CCSD electronic archive server

HAL Descartes

Chalmers Research

Chalmers Publication Library

Hal-Diderot

Association for the Advancement of Artificial Intelligence: AAAI Publications

Corrupt Bandits for Preserving Local Privacy

Author: Gajane Pratik
Kaufmann Emilie
Urvoy Tanguy
Publication venue
Publication date: 02/11/2017
Field of study

We study a variant of the stochastic multi-armed bandit (MAB) problem in which the rewards are corrupted. In this framework, motivated by privacy preservation in online recommender systems, the goal is to maximize the sum of the (unobserved) rewards, based on the observation of transformation of these rewards through a stochastic corruption process with known parameters. We provide a lower bound on the expected regret of any bandit algorithm in this corrupted setting. We devise a frequentist algorithm, KLUCB-CF, and a Bayesian algorithm, TS-CF and give upper bounds on their regret. We also provide the appropriate corruption parameters to guarantee a desired level of local privacy and analyze how this impacts the regret. Finally, we present some experimental results that confirm our analysis

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

HAL Descartes

Hal-Diderot

Best-Arm Identification for Quantile Bandits with Privacy

Author: Kalogerias Dionysios S.
Nikolakakis Kontantinos E.
Sarwate Anand D.
Sheffet Or
Publication venue
Publication date: 11/06/2020
Field of study

We study the best-arm identification problem in multi-armed bandits with stochastic, potentially private rewards, when the goal is to identify the arm with the highest quantile at a fixed, prescribed level. First, we propose a (non-private) successive elimination algorithm for strictly optimal best-arm identification, we show that our algorithm is

\delta

-PAC and we characterize its sample complexity. Further, we provide a lower bound on the expected number of pulls, showing that the proposed algorithm is essentially optimal up to logarithmic factors. Both upper and lower complexity bounds depend on a special definition of the associated suboptimality gap, designed in particular for the quantile bandit problem, as we show when the gap approaches zero, best-arm identification is impossible. Second, motivated by applications where the rewards are private, we provide a differentially private successive elimination algorithm whose sample complexity is finite even for distributions with infinite support-size, and we characterize its sample complexity as well. Our algorithms do not require prior knowledge of either the suboptimality gap or other statistical information related to the bandit problem at hand.Comment: 24 pages, 4 figure

arXiv.org e-Print Archive

Federated Linear Contextual Bandits with User-level Differential Privacy

Author: Hajzinia Meisam
Huang Ruiquan
Melis Luca
Shen Milan
Yang Jing
Zhang Huanyu
Publication venue
Publication date: 08/06/2023
Field of study

This paper studies federated linear contextual bandits under the notion of user-level differential privacy (DP). We first introduce a unified federated bandits framework that can accommodate various definitions of DP in the sequential decision-making setting. We then formally introduce user-level central DP (CDP) and local DP (LDP) in the federated bandits framework, and investigate the fundamental trade-offs between the learning regrets and the corresponding DP guarantees in a federated linear contextual bandits model. For CDP, we propose a federated algorithm termed as \robin and show that it is near-optimal in terms of the number of clients

M

and the privacy budget

\varepsilon

by deriving nearly-matching upper and lower regret bounds when user-level DP is satisfied. For LDP, we obtain several lower bounds, indicating that learning under user-level

(\varepsilon,\delta)

-LDP must suffer a regret blow-up factor at least {

\min\{1/\varepsilon,M\}

\min\{1/\sqrt{\varepsilon},\sqrt{M}\}

} under different conditions.Comment: Accepted by ICML 202

arXiv.org e-Print Archive