1,820 research outputs found
Online Clustering of Bandits
We introduce a novel algorithmic approach to content recommendation based on
adaptive clustering of exploration-exploitation ("bandit") strategies. We
provide a sharp regret analysis of this algorithm in a standard stochastic
noise setting, demonstrate its scalability properties, and prove its
effectiveness on a number of artificial and real-world datasets. Our
experiments show a significant increase in prediction performance over
state-of-the-art methods for bandit problems.Comment: In E. Xing and T. Jebara (Eds.), Proceedings of 31st International
Conference on Machine Learning, Journal of Machine Learning Research Workshop
and Conference Proceedings, Vol.32 (JMLR W&CP-32), Beijing, China, Jun.
21-26, 2014 (ICML 2014), Submitted by Shuai Li
(https://sites.google.com/site/shuailidotsli
Online Clustering of Bandits with Misspecified User Models
The contextual linear bandit is an important online learning problem where
given arm features, a learning agent selects an arm at each round to maximize
the cumulative rewards in the long run. A line of works, called the clustering
of bandits (CB), utilize the collaborative effect over user preferences and
have shown significant improvements over classic linear bandit algorithms.
However, existing CB algorithms require well-specified linear user models and
can fail when this critical assumption does not hold. Whether robust CB
algorithms can be designed for more practical scenarios with misspecified user
models remains an open problem. In this paper, we are the first to present the
important problem of clustering of bandits with misspecified user models
(CBMUM), where the expected rewards in user models can be perturbed away from
perfect linear models. We devise two robust CB algorithms, RCLUMB and RSCLUMB
(representing the learned clustering structure with dynamic graph and sets,
respectively), that can accommodate the inaccurate user preference estimations
and erroneous clustering caused by model misspecifications. We prove regret
upper bounds of for our
algorithms under milder assumptions than previous CB works (notably, we move
past a restrictive technical assumption on the distribution of the arms), which
match the lower bound asymptotically in up to logarithmic factors, and also
match the state-of-the-art results in several degenerate cases. The techniques
in proving the regret caused by misclustering users are quite general and may
be of independent interest. Experiments on both synthetic and real-world data
show our outperformance over previous algorithms
The art of clustering bandits.
Multi-armed bandit problems are receiving a great deal of attention because they adequately formalize the exploration-exploitation trade-offs arising in several industrially relevant applications, such as online advertisement and, more generally, recommendation systems. In many cases, however, these applications have a strong social component, whose integration in the bandit algorithms could lead to a dramatic performance increase. For instance, we may want to serve content to a group of users by taking advantage of an underlying network of social relationships among them. The purpose of this thesis is to introduce novel and principled algorithmic approaches to the solution of such networked bandit problems. Starting from a global (Laplacian-based) strategy which allocates a bandit algorithm to each network node (user), and allows it to "share" signals (contexts and payoffs) with the neghboring nodes, our goal is to derive and experimentally test more scalable approaches based on different ways of clustering the graph nodes. More importantly, we shall investigate the case when the graph structure is not given ahead of time, and has to be inferred based on past user behavior. A general difficulty arising in such practical scenarios is that data sequences are typically nonstationary, implying that traditional statistical inference methods should be used cautiously, possibly replacing them with by more robust nonstochastic (e.g., game-theoretic) inference methods.
In this thesis, we will firstly introduce the centralized clustering bandits. Then, we propose the corresponding solution in decentralized scenario. After that, we explain the generic collaborative clustering bandits. Finally, we extend and showcase the state-of-the-art clustering bandits that we developed in the quantification problem
- …