Search CORE

192 research outputs found

Multiple-Play Bandits in the Position-Based Model

Author: Cappé Olivier
Lagrée Paul
Vernade Claire
Publication venue
Publication date: 05/01/2016
Field of study

Sequentially learning to place items in multi-position displays or lists is a task that can be cast into the multiple-play semi-bandit setting. However, a major concern in this context is when the system cannot decide whether the user feedback for each item is actually exploitable. Indeed, much of the content may have been simply ignored by the user. The present work proposes to exploit available information regarding the display position bias under the so-called Position-based click model (PBM). We first discuss how this model differs from the Cascade model and its variants considered in several recent works on multiple-play bandits. We then provide a novel regret lower bound for this model as well as computationally efficient algorithms that display good empirical and theoretical performance

arXiv.org e-Print Archive

HAL-CentraleSupelec

HAL-Rennes 1

Copeland Dueling Bandits

Author: de Rijke Maarten
Karnin Zohar
Whiteson Shimon
Zoghi Masrour
Publication venue
Publication date: 01/01/2015
Field of study

A version of the dueling bandit problem is addressed in which a Condorcet winner may not exist. Two algorithms are proposed that instead seek to minimize regret with respect to the Copeland winner, which, unlike the Condorcet winner, is guaranteed to exist. The first, Copeland Confidence Bound (CCB), is designed for small numbers of arms, while the second, Scalable Copeland Bandits (SCB), works better for large-scale problems. We provide theoretical results bounding the regret accumulated by CCB and SCB, both substantially improving existing results. Such existing results either offer bounds of the form

O(K \log T)

but require restrictive assumptions, or offer bounds of the form

O(K^2 \log T)

without requiring such assumptions. Our results offer the best of both worlds:

O(K \log T)

bounds without restrictive assumptions.Comment: 33 pages, 8 figure

arXiv.org e-Print Archive

CiteSeerX

Oxford University Research Archive

Factored Bandits

Author: Seldin Yevgeny
Zimmert Julian
Publication venue
Publication date: 01/01/2018
Field of study

We introduce the factored bandits model, which is a framework for learning with limited (bandit) feedback, where actions can be decomposed into a Cartesian product of atomic actions. Factored bandits incorporate rank-1 bandits as a special case, but significantly relax the assumptions on the form of the reward function. We provide an anytime algorithm for stochastic factored bandits and up to constants matching upper and lower regret bounds for the problem. Furthermore, we show that with a slight modification the proposed algorithm can be applied to utility based dueling bandits. We obtain an improvement in the additive terms of the regret bound compared to state of the art algorithms (the additive terms are dominating up to time horizons which are exponential in the number of arms)

arXiv.org e-Print Archive

Copenhagen University Research Information System