3,456 research outputs found
Collaborative Learning of Stochastic Bandits over a Social Network
We consider a collaborative online learning paradigm, wherein a group of
agents connected through a social network are engaged in playing a stochastic
multi-armed bandit game. Each time an agent takes an action, the corresponding
reward is instantaneously observed by the agent, as well as its neighbours in
the social network. We perform a regret analysis of various policies in this
collaborative learning setting. A key finding of this paper is that natural
extensions of widely-studied single agent learning policies to the network
setting need not perform well in terms of regret. In particular, we identify a
class of non-altruistic and individually consistent policies, and argue by
deriving regret lower bounds that they are liable to suffer a large regret in
the networked setting. We also show that the learning performance can be
substantially improved if the agents exploit the structure of the network, and
develop a simple learning algorithm based on dominating sets of the network.
Specifically, we first consider a star network, which is a common motif in
hierarchical social networks, and show analytically that the hub agent can be
used as an information sink to expedite learning and improve the overall
regret. We also derive networkwide regret bounds for the algorithm applied to
general networks. We conduct numerical experiments on a variety of networks to
corroborate our analytical results.Comment: 14 Pages, 6 Figure
Corrupt Bandits for Preserving Local Privacy
We study a variant of the stochastic multi-armed bandit (MAB) problem in
which the rewards are corrupted. In this framework, motivated by privacy
preservation in online recommender systems, the goal is to maximize the sum of
the (unobserved) rewards, based on the observation of transformation of these
rewards through a stochastic corruption process with known parameters. We
provide a lower bound on the expected regret of any bandit algorithm in this
corrupted setting. We devise a frequentist algorithm, KLUCB-CF, and a Bayesian
algorithm, TS-CF and give upper bounds on their regret. We also provide the
appropriate corruption parameters to guarantee a desired level of local privacy
and analyze how this impacts the regret. Finally, we present some experimental
results that confirm our analysis
Online Clustering of Bandits
We introduce a novel algorithmic approach to content recommendation based on
adaptive clustering of exploration-exploitation ("bandit") strategies. We
provide a sharp regret analysis of this algorithm in a standard stochastic
noise setting, demonstrate its scalability properties, and prove its
effectiveness on a number of artificial and real-world datasets. Our
experiments show a significant increase in prediction performance over
state-of-the-art methods for bandit problems.Comment: In E. Xing and T. Jebara (Eds.), Proceedings of 31st International
Conference on Machine Learning, Journal of Machine Learning Research Workshop
and Conference Proceedings, Vol.32 (JMLR W&CP-32), Beijing, China, Jun.
21-26, 2014 (ICML 2014), Submitted by Shuai Li
(https://sites.google.com/site/shuailidotsli
- …