Search CORE

634 research outputs found

Dynamic Clustering of Contextual Multi-Armed Bandits

Author: Auer P.
Cesa-Bianchi N.
Chapelle O.
Gentile C.
Maillard O.-A.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/11/2014
Field of study

Crossref

Institutional Knowledge at Singapore Management University

Learning Contextual Bandits in a Non-stationary Environment

Author: Auer P.
Cesa-Bianchi Nicolò
Chu Wei
Gentile Claudio
Gentile Claudio
Peter Auer
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 23/05/2018
Field of study

Multi-armed bandit algorithms have become a reference solution for handling the explore/exploit dilemma in recommender systems, and many other important real-world problems, such as display advertisement. However, such algorithms usually assume a stationary reward distribution, which hardly holds in practice as users' preferences are dynamic. This inevitably costs a recommender system consistent suboptimal performance. In this paper, we consider the situation where the underlying distribution of reward remains unchanged over (possibly short) epochs and shifts at unknown time instants. In accordance, we propose a contextual bandit algorithm that detects possible changes of environment based on its reward estimation confidence and updates its arm selection strategy respectively. Rigorous upper regret bound analysis of the proposed algorithm demonstrates its learning effectiveness in such a non-trivial environment. Extensive empirical evaluations on both synthetic and real-world datasets for recommendation confirm its practical utility in a changing environment.Comment: 10 pages, 13 figures, To appear on ACM Special Interest Group on Information Retrieval (SIGIR) 201

arXiv.org e-Print Archive

Crossref

Online Clustering of Bandits

Author: Gentile Claudio
Li Shuai
Zappella Giovanni
Publication venue
Publication date: 01/01/2014
Field of study

We introduce a novel algorithmic approach to content recommendation based on adaptive clustering of exploration-exploitation ("bandit") strategies. We provide a sharp regret analysis of this algorithm in a standard stochastic noise setting, demonstrate its scalability properties, and prove its effectiveness on a number of artificial and real-world datasets. Our experiments show a significant increase in prediction performance over state-of-the-art methods for bandit problems.Comment: In E. Xing and T. Jebara (Eds.), Proceedings of 31st International Conference on Machine Learning, Journal of Machine Learning Research Workshop and Conference Proceedings, Vol.32 (JMLR W&CP-32), Beijing, China, Jun. 21-26, 2014 (ICML 2014), Submitted by Shuai Li (https://sites.google.com/site/shuailidotsli

arXiv.org e-Print Archive

Archivio istituzionale della ricerca - Università dell'Insubria