1,547 research outputs found
A Contextual Bandit Bake-off
Contextual bandit algorithms are essential for solving many real-world
interactive machine learning problems. Despite multiple recent successes on
statistically and computationally efficient methods, the practical behavior of
these algorithms is still poorly understood. We leverage the availability of
large numbers of supervised learning datasets to empirically evaluate
contextual bandit algorithms, focusing on practical methods that learn by
relying on optimization oracles from supervised learning. We find that a recent
method (Foster et al., 2018) using optimism under uncertainty works the best
overall. A surprisingly close second is a simple greedy baseline that only
explores implicitly through the diversity of contexts, followed by a variant of
Online Cover (Agarwal et al., 2014) which tends to be more conservative but
robust to problem specification by design. Along the way, we also evaluate
various components of contextual bandit algorithm design such as loss
estimators. Overall, this is a thorough study and review of contextual bandit
methodology
Unbiased Offline Evaluation of Contextual-bandit-based News Article Recommendation Algorithms
Contextual bandit algorithms have become popular for online recommendation
systems such as Digg, Yahoo! Buzz, and news recommendation in general.
\emph{Offline} evaluation of the effectiveness of new algorithms in these
applications is critical for protecting online user experiences but very
challenging due to their "partial-label" nature. Common practice is to create a
simulator which simulates the online environment for the problem at hand and
then run an algorithm against this simulator. However, creating simulator
itself is often difficult and modeling bias is usually unavoidably introduced.
In this paper, we introduce a \emph{replay} methodology for contextual bandit
algorithm evaluation. Different from simulator-based approaches, our method is
completely data-driven and very easy to adapt to different applications. More
importantly, our method can provide provably unbiased evaluations. Our
empirical results on a large-scale news article recommendation dataset
collected from Yahoo! Front Page conform well with our theoretical results.
Furthermore, comparisons between our offline replay and online bucket
evaluation of several contextual bandit algorithms show accuracy and
effectiveness of our offline evaluation method.Comment: 10 pages, 7 figures, revised from the published version at the WSDM
2011 conferenc
Context Attentive Bandits: Contextual Bandit with Restricted Context
We consider a novel formulation of the multi-armed bandit model, which we
call the contextual bandit with restricted context, where only a limited number
of features can be accessed by the learner at every iteration. This novel
formulation is motivated by different online problems arising in clinical
trials, recommender systems and attention modeling. Herein, we adapt the
standard multi-armed bandit algorithm known as Thompson Sampling to take
advantage of our restricted context setting, and propose two novel algorithms,
called the Thompson Sampling with Restricted Context(TSRC) and the Windows
Thompson Sampling with Restricted Context(WTSRC), for handling stationary and
nonstationary environments, respectively. Our empirical results demonstrate
advantages of the proposed approaches on several real-life datasetsComment: IJCAI 201
- …