80 research outputs found
Algorithms for Differentially Private Multi-Armed Bandits
We present differentially private algorithms for the stochastic Multi-Armed
Bandit (MAB) problem. This is a problem for applications such as adaptive
clinical trials, experiment design, and user-targeted advertising where private
information is connected to individual rewards. Our major contribution is to
show that there exist differentially private variants of
Upper Confidence Bound algorithms which have optimal regret, . This is a significant improvement over previous results, which only
achieve poly-log regret , because of our use of a
novel interval-based mechanism. We also substantially improve the bounds of
previous family of algorithms which use a continual release mechanism.
Experiments clearly validate our theoretical bounds
Corrupt Bandits for Preserving Local Privacy
We study a variant of the stochastic multi-armed bandit (MAB) problem in
which the rewards are corrupted. In this framework, motivated by privacy
preservation in online recommender systems, the goal is to maximize the sum of
the (unobserved) rewards, based on the observation of transformation of these
rewards through a stochastic corruption process with known parameters. We
provide a lower bound on the expected regret of any bandit algorithm in this
corrupted setting. We devise a frequentist algorithm, KLUCB-CF, and a Bayesian
algorithm, TS-CF and give upper bounds on their regret. We also provide the
appropriate corruption parameters to guarantee a desired level of local privacy
and analyze how this impacts the regret. Finally, we present some experimental
results that confirm our analysis
Best-Arm Identification for Quantile Bandits with Privacy
We study the best-arm identification problem in multi-armed bandits with
stochastic, potentially private rewards, when the goal is to identify the arm
with the highest quantile at a fixed, prescribed level. First, we propose a
(non-private) successive elimination algorithm for strictly optimal best-arm
identification, we show that our algorithm is -PAC and we characterize
its sample complexity. Further, we provide a lower bound on the expected number
of pulls, showing that the proposed algorithm is essentially optimal up to
logarithmic factors. Both upper and lower complexity bounds depend on a special
definition of the associated suboptimality gap, designed in particular for the
quantile bandit problem, as we show when the gap approaches zero, best-arm
identification is impossible. Second, motivated by applications where the
rewards are private, we provide a differentially private successive elimination
algorithm whose sample complexity is finite even for distributions with
infinite support-size, and we characterize its sample complexity as well. Our
algorithms do not require prior knowledge of either the suboptimality gap or
other statistical information related to the bandit problem at hand.Comment: 24 pages, 4 figure
Federated Linear Contextual Bandits with User-level Differential Privacy
This paper studies federated linear contextual bandits under the notion of
user-level differential privacy (DP). We first introduce a unified federated
bandits framework that can accommodate various definitions of DP in the
sequential decision-making setting. We then formally introduce user-level
central DP (CDP) and local DP (LDP) in the federated bandits framework, and
investigate the fundamental trade-offs between the learning regrets and the
corresponding DP guarantees in a federated linear contextual bandits model. For
CDP, we propose a federated algorithm termed as \robin and show that it is
near-optimal in terms of the number of clients and the privacy budget
by deriving nearly-matching upper and lower regret bounds when
user-level DP is satisfied. For LDP, we obtain several lower bounds, indicating
that learning under user-level -LDP must suffer a regret
blow-up factor at least { or
} under different conditions.Comment: Accepted by ICML 202
- …