Search CORE

341 research outputs found

Efficient Bandit Algorithms for Online Multiclass Prediction

Author: Kakade Sham M
Shalev-Shwartz Shai
Tewari Ambuj
Publication venue: ScholarlyCommons
Publication date: 01/01/2008
Field of study

This paper introduces the Banditron, a variant of the Perceptron [Rosenblatt, 1958], for the multiclass bandit setting. The multiclass bandit setting models a wide range of practical supervised learning applications where the learner only receives partial feedback (referred to as bandit feedback, in the spirit of multi-armed bandit models) with respect to the true label (e.g. in many web applications users often only provide positive click feedback which does not necessarily fully disclose a true label). The Banditron has the ability to learn in a multiclass classification setting with the bandit feedback which only reveals whether or not the prediction made by the algorithm was correct or not (but does not necessarily reveal the true label). We provide (relative) mistake bounds which show how the Banditron enjoys favorable performance, and our experiments demonstrate the practicality of the algorithm. Furthermore, this paper pays close attention to the important special case when the data is linearly separable --- a problem which has been exhaustively studied in the full information setting yet is novel in the bandit setting

CiteSeerX

Crossref

ScholarlyCommons@Penn

A Contextual Bandit Bake-off

Author: Agarwal Alekh
Bietti Alberto
Langford John
Publication venue
Publication date: 24/01/2020
Field of study

Contextual bandit algorithms are essential for solving many real-world interactive machine learning problems. Despite multiple recent successes on statistically and computationally efficient methods, the practical behavior of these algorithms is still poorly understood. We leverage the availability of large numbers of supervised learning datasets to empirically evaluate contextual bandit algorithms, focusing on practical methods that learn by relying on optimization oracles from supervised learning. We find that a recent method (Foster et al., 2018) using optimism under uncertainty works the best overall. A surprisingly close second is a simple greedy baseline that only explores implicitly through the diversity of contexts, followed by a variant of Online Cover (Agarwal et al., 2014) which tends to be more conservative but robust to problem specification by design. Along the way, we also evaluate various components of contextual bandit algorithm design such as loss estimators. Overall, this is a thorough study and review of contextual bandit methodology

arXiv.org e-Print Archive

Hal - Université Grenoble Alpes

INRIA a CCSD electronic archive server