Search CORE

3,330 research outputs found

Thompson Sampling for Bandits with Clustered Arms

Author: Carlsson Emil
Dubhashi Devdatt
Johansson Fredrik
Publication venue: 'International Joint Conferences on Artificial Intelligence'
Publication date: 01/01/2021
Field of study

We propose algorithms based on a multi-level Thompson sampling scheme, for the stochastic multi-armed bandit and its contextual variant with linear expected rewards, in the setting where arms are clustered. We show, both theoretically and empirically, how exploiting a given cluster structure can significantly improve the regret and computational cost compared to using standard Thompson sampling. In the case of the stochastic multi-armed bandit we give upper bounds on the expected cumulative regret showing how it depends on the quality of the clustering. Finally, we perform an empirical evaluation showing that our algorithms perform well compared to previously proposed algorithms for bandits with clustered arms

Chalmers Research

Multi-armed Bandit Algorithms and Empirical Evaluation

Author: D. Luce
E. Even-Dar
H. Robbins
J.C. Gittins
J.P. Hardwick
L.P. Kaelbling
N. Cesa-Bianchi
N. Meuleau
P. Auer
P. Auer
P. Auer
P. Varaiya
R.S. Sutton
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2005
Field of study

Crossref

Unbiased Offline Evaluation of Contextual-bandit-based News Article Recommendation Algorithms

Author: Chu Wei
Langford John
Li Lihong
Wang Xuanhui
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2011
Field of study

Contextual bandit algorithms have become popular for online recommendation systems such as Digg, Yahoo! Buzz, and news recommendation in general. \emph{Offline} evaluation of the effectiveness of new algorithms in these applications is critical for protecting online user experiences but very challenging due to their "partial-label" nature. Common practice is to create a simulator which simulates the online environment for the problem at hand and then run an algorithm against this simulator. However, creating simulator itself is often difficult and modeling bias is usually unavoidably introduced. In this paper, we introduce a \emph{replay} methodology for contextual bandit algorithm evaluation. Different from simulator-based approaches, our method is completely data-driven and very easy to adapt to different applications. More importantly, our method can provide provably unbiased evaluations. Our empirical results on a large-scale news article recommendation dataset collected from Yahoo! Front Page conform well with our theoretical results. Furthermore, comparisons between our offline replay and online bucket evaluation of several contextual bandit algorithms show accuracy and effectiveness of our offline evaluation method.Comment: 10 pages, 7 figures, revised from the published version at the WSDM 2011 conferenc

arXiv.org e-Print Archive

CiteSeerX

Crossref

Enhancing Evolutionary Conversion Rate Optimization via Multi-armed Bandit Algorithms

Author: Miikkulainen Risto
Qiu Xin
Publication venue
Publication date: 16/11/2018
Field of study

Conversion rate optimization means designing web interfaces such that more visitors perform a desired action (such as register or purchase) on the site. One promising approach, implemented in Sentient Ascend, is to optimize the design using evolutionary algorithms, evaluating each candidate design online with actual visitors. Because such evaluations are costly and noisy, several challenges emerge: How can available visitor traffic be used most efficiently? How can good solutions be identified most reliably? How can a high conversion rate be maintained during optimization? This paper proposes a new technique to address these issues. Traffic is allocated to candidate solutions using a multi-armed bandit algorithm, using more traffic on those evaluations that are most useful. In a best-arm identification mode, the best candidate can be identified reliably at the end of evolution, and in a campaign mode, the overall conversion rate can be optimized throughout the entire evolution process. Multi-armed bandit algorithms thus improve performance and reliability of machine discovery in noisy real-world environments.Comment: The Thirty-First Innovative Applications of Artificial Intelligence Conferenc

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

Context Attentive Bandits: Contextual Bandit with Restricted Context

Author: Bouneffouf Djallel
Cecchi Guillermo A.
Feraud Raphael
Rish Irina
Publication venue
Publication date: 07/06/2017
Field of study

We consider a novel formulation of the multi-armed bandit model, which we call the contextual bandit with restricted context, where only a limited number of features can be accessed by the learner at every iteration. This novel formulation is motivated by different online problems arising in clinical trials, recommender systems and attention modeling. Herein, we adapt the standard multi-armed bandit algorithm known as Thompson Sampling to take advantage of our restricted context setting, and propose two novel algorithms, called the Thompson Sampling with Restricted Context(TSRC) and the Windows Thompson Sampling with Restricted Context(WTSRC), for handling stationary and nonstationary environments, respectively. Our empirical results demonstrate advantages of the proposed approaches on several real-life datasetsComment: IJCAI 201

arXiv.org e-Print Archive

Crossref