250 research outputs found

    Algorithms for Differentially Private Multi-Armed Bandits

    Get PDF
    We present differentially private algorithms for the stochastic Multi-Armed Bandit (MAB) problem. This is a problem for applications such as adaptive clinical trials, experiment design, and user-targeted advertising where private information is connected to individual rewards. Our major contribution is to show that there exist (ϵ,δ)(\epsilon, \delta) differentially private variants of Upper Confidence Bound algorithms which have optimal regret, O(ϵ1+logT)O(\epsilon^{-1} + \log T). This is a significant improvement over previous results, which only achieve poly-log regret O(ϵ2log2T)O(\epsilon^{-2} \log^{2} T), because of our use of a novel interval-based mechanism. We also substantially improve the bounds of previous family of algorithms which use a continual release mechanism. Experiments clearly validate our theoretical bounds

    Corrupt Bandits for Preserving Local Privacy

    Get PDF
    We study a variant of the stochastic multi-armed bandit (MAB) problem in which the rewards are corrupted. In this framework, motivated by privacy preservation in online recommender systems, the goal is to maximize the sum of the (unobserved) rewards, based on the observation of transformation of these rewards through a stochastic corruption process with known parameters. We provide a lower bound on the expected regret of any bandit algorithm in this corrupted setting. We devise a frequentist algorithm, KLUCB-CF, and a Bayesian algorithm, TS-CF and give upper bounds on their regret. We also provide the appropriate corruption parameters to guarantee a desired level of local privacy and analyze how this impacts the regret. Finally, we present some experimental results that confirm our analysis
    corecore