5,328 research outputs found
A Contextual Bandit Bake-off
Contextual bandit algorithms are essential for solving many real-world
interactive machine learning problems. Despite multiple recent successes on
statistically and computationally efficient methods, the practical behavior of
these algorithms is still poorly understood. We leverage the availability of
large numbers of supervised learning datasets to empirically evaluate
contextual bandit algorithms, focusing on practical methods that learn by
relying on optimization oracles from supervised learning. We find that a recent
method (Foster et al., 2018) using optimism under uncertainty works the best
overall. A surprisingly close second is a simple greedy baseline that only
explores implicitly through the diversity of contexts, followed by a variant of
Online Cover (Agarwal et al., 2014) which tends to be more conservative but
robust to problem specification by design. Along the way, we also evaluate
various components of contextual bandit algorithm design such as loss
estimators. Overall, this is a thorough study and review of contextual bandit
methodology
Axiomatic Interpretability for Multiclass Additive Models
Generalized additive models (GAMs) are favored in many regression and binary
classification problems because they are able to fit complex, nonlinear
functions while still remaining interpretable. In the first part of this paper,
we generalize a state-of-the-art GAM learning algorithm based on boosted trees
to the multiclass setting, and show that this multiclass algorithm outperforms
existing GAM learning algorithms and sometimes matches the performance of full
complexity models such as gradient boosted trees.
In the second part, we turn our attention to the interpretability of GAMs in
the multiclass setting. Surprisingly, the natural interpretability of GAMs
breaks down when there are more than two classes. Naive interpretation of
multiclass GAMs can lead to false conclusions. Inspired by binary GAMs, we
identify two axioms that any additive model must satisfy in order to not be
visually misleading. We then develop a technique called Additive
Post-Processing for Interpretability (API), that provably transforms a
pre-trained additive model to satisfy the interpretability axioms without
sacrificing accuracy. The technique works not just on models trained with our
learning algorithm, but on any multiclass additive model, including multiclass
linear and logistic regression. We demonstrate the effectiveness of API on a
12-class infant mortality dataset.Comment: KDD 201
Generalization Bounds in the Predict-then-Optimize Framework
The predict-then-optimize framework is fundamental in many practical
settings: predict the unknown parameters of an optimization problem, and then
solve the problem using the predicted values of the parameters. A natural loss
function in this environment is to consider the cost of the decisions induced
by the predicted parameters, in contrast to the prediction error of the
parameters. This loss function was recently introduced in Elmachtoub and Grigas
(2017) and referred to as the Smart Predict-then-Optimize (SPO) loss. In this
work, we seek to provide bounds on how well the performance of a prediction
model fit on training data generalizes out-of-sample, in the context of the SPO
loss. Since the SPO loss is non-convex and non-Lipschitz, standard results for
deriving generalization bounds do not apply.
We first derive bounds based on the Natarajan dimension that, in the case of
a polyhedral feasible region, scale at most logarithmically in the number of
extreme points, but, in the case of a general convex feasible region, have
linear dependence on the decision dimension. By exploiting the structure of the
SPO loss function and a key property of the feasible region, which we denote as
the strength property, we can dramatically improve the dependence on the
decision and feature dimensions. Our approach and analysis rely on placing a
margin around problematic predictions that do not yield unique optimal
solutions, and then providing generalization bounds in the context of a
modified margin SPO loss function that is Lipschitz continuous. Finally, we
characterize the strength property and show that the modified SPO loss can be
computed efficiently for both strongly convex bodies and polytopes with an
explicit extreme point representation.Comment: Preliminary version in NeurIPS 201
- …