7,405 research outputs found
Scale-sensitive Psi-dimensions: the Capacity Measures for Classifiers Taking Values in R^Q
Bounds on the risk play a crucial role in statistical learning theory. They
usually involve as capacity measure of the model studied the VC dimension or
one of its extensions. In classification, such "VC dimensions" exist for models
taking values in {0, 1}, {1,..., Q} and R. We introduce the generalizations
appropriate for the missing case, the one of models with values in R^Q. This
provides us with a new guaranteed risk for M-SVMs which appears superior to the
existing one
Generalization Bounds in the Predict-then-Optimize Framework
The predict-then-optimize framework is fundamental in many practical
settings: predict the unknown parameters of an optimization problem, and then
solve the problem using the predicted values of the parameters. A natural loss
function in this environment is to consider the cost of the decisions induced
by the predicted parameters, in contrast to the prediction error of the
parameters. This loss function was recently introduced in Elmachtoub and Grigas
(2017) and referred to as the Smart Predict-then-Optimize (SPO) loss. In this
work, we seek to provide bounds on how well the performance of a prediction
model fit on training data generalizes out-of-sample, in the context of the SPO
loss. Since the SPO loss is non-convex and non-Lipschitz, standard results for
deriving generalization bounds do not apply.
We first derive bounds based on the Natarajan dimension that, in the case of
a polyhedral feasible region, scale at most logarithmically in the number of
extreme points, but, in the case of a general convex feasible region, have
linear dependence on the decision dimension. By exploiting the structure of the
SPO loss function and a key property of the feasible region, which we denote as
the strength property, we can dramatically improve the dependence on the
decision and feature dimensions. Our approach and analysis rely on placing a
margin around problematic predictions that do not yield unique optimal
solutions, and then providing generalization bounds in the context of a
modified margin SPO loss function that is Lipschitz continuous. Finally, we
characterize the strength property and show that the modified SPO loss can be
computed efficiently for both strongly convex bodies and polytopes with an
explicit extreme point representation.Comment: Preliminary version in NeurIPS 201
Generalization Error in Deep Learning
Deep learning models have lately shown great performance in various fields
such as computer vision, speech recognition, speech translation, and natural
language processing. However, alongside their state-of-the-art performance, it
is still generally unclear what is the source of their generalization ability.
Thus, an important question is what makes deep neural networks able to
generalize well from the training set to new data. In this article, we provide
an overview of the existing theory and bounds for the characterization of the
generalization error of deep neural networks, combining both classical and more
recent theoretical and empirical results
- …