810 research outputs found
Balanced Linear Contextual Bandits
Contextual bandit algorithms are sensitive to the estimation method of the
outcome model as well as the exploration method used, particularly in the
presence of rich heterogeneity or complex outcome models, which can lead to
difficult estimation problems along the path of learning. We develop algorithms
for contextual bandits with linear payoffs that integrate balancing methods
from the causal inference literature in their estimation to make it less prone
to problems of estimation bias. We provide the first regret bound analyses for
linear contextual bandits with balancing and show that our algorithms match the
state of the art theoretical guarantees. We demonstrate the strong practical
advantage of balanced contextual bandits on a large number of supervised
learning datasets and on a synthetic example that simulates model
misspecification and prejudice in the initial training data.Comment: AAAI 2019 Oral Presentation. arXiv admin note: substantial text
overlap with arXiv:1711.0707
A Contextual Bandit Bake-off
Contextual bandit algorithms are essential for solving many real-world
interactive machine learning problems. Despite multiple recent successes on
statistically and computationally efficient methods, the practical behavior of
these algorithms is still poorly understood. We leverage the availability of
large numbers of supervised learning datasets to empirically evaluate
contextual bandit algorithms, focusing on practical methods that learn by
relying on optimization oracles from supervised learning. We find that a recent
method (Foster et al., 2018) using optimism under uncertainty works the best
overall. A surprisingly close second is a simple greedy baseline that only
explores implicitly through the diversity of contexts, followed by a variant of
Online Cover (Agarwal et al., 2014) which tends to be more conservative but
robust to problem specification by design. Along the way, we also evaluate
various components of contextual bandit algorithm design such as loss
estimators. Overall, this is a thorough study and review of contextual bandit
methodology
Unbiased Offline Evaluation of Contextual-bandit-based News Article Recommendation Algorithms
Contextual bandit algorithms have become popular for online recommendation
systems such as Digg, Yahoo! Buzz, and news recommendation in general.
\emph{Offline} evaluation of the effectiveness of new algorithms in these
applications is critical for protecting online user experiences but very
challenging due to their "partial-label" nature. Common practice is to create a
simulator which simulates the online environment for the problem at hand and
then run an algorithm against this simulator. However, creating simulator
itself is often difficult and modeling bias is usually unavoidably introduced.
In this paper, we introduce a \emph{replay} methodology for contextual bandit
algorithm evaluation. Different from simulator-based approaches, our method is
completely data-driven and very easy to adapt to different applications. More
importantly, our method can provide provably unbiased evaluations. Our
empirical results on a large-scale news article recommendation dataset
collected from Yahoo! Front Page conform well with our theoretical results.
Furthermore, comparisons between our offline replay and online bucket
evaluation of several contextual bandit algorithms show accuracy and
effectiveness of our offline evaluation method.Comment: 10 pages, 7 figures, revised from the published version at the WSDM
2011 conferenc
- …