This paper studies federated linear contextual bandits under the notion of
user-level differential privacy (DP). We first introduce a unified federated
bandits framework that can accommodate various definitions of DP in the
sequential decision-making setting. We then formally introduce user-level
central DP (CDP) and local DP (LDP) in the federated bandits framework, and
investigate the fundamental trade-offs between the learning regrets and the
corresponding DP guarantees in a federated linear contextual bandits model. For
CDP, we propose a federated algorithm termed as \robin and show that it is
near-optimal in terms of the number of clients M and the privacy budget
ε by deriving nearly-matching upper and lower regret bounds when
user-level DP is satisfied. For LDP, we obtain several lower bounds, indicating
that learning under user-level (ε,δ)-LDP must suffer a regret
blow-up factor at least {min{1/ε,M} or
min{1/ε,M}} under different conditions.Comment: Accepted by ICML 202