21 research outputs found
On the proliferation of support vectors in high dimensions
The support vector machine (SVM) is a well-established classification method
whose name refers to the particular training examples, called support vectors,
that determine the maximum margin separating hyperplane. The SVM classifier is
known to enjoy good generalization properties when the number of support
vectors is small compared to the number of training examples. However, recent
research has shown that in sufficiently high-dimensional linear classification
problems, the SVM can generalize well despite a proliferation of support
vectors where all training examples are support vectors. In this paper, we
identify new deterministic equivalences for this phenomenon of support vector
proliferation, and use them to (1) substantially broaden the conditions under
which the phenomenon occurs in high-dimensional settings, and (2) prove a
nearly matching converse result
The Complexity of Infinite-Horizon General-Sum Stochastic Games
We study the complexity of computing stationary Nash equilibrium (NE) in n-player infinite-horizon general-sum stochastic games. We focus on the problem of computing NE in such stochastic games when each player is restricted to choosing a stationary policy and rewards are discounted. First, we prove that computing such NE is in PPAD (in addition to clearly being PPAD-hard). Second, we consider turn-based specializations of such games where at each state there is at most a single player that can take actions and show that these (seemingly-simpler) games remain PPAD-hard. Third, we show that under further structural assumptions on the rewards computing NE in such turn-based games is possible in polynomial time. Towards achieving these results we establish structural facts about stochastic games of broader utility, including monotonicity of utilities under single-state single-action changes and reductions to settings where each player controls a single state
Towards Last-layer Retraining for Group Robustness with Fewer Annotations
Empirical risk minimization (ERM) of neural networks is prone to
over-reliance on spurious correlations and poor generalization on minority
groups. The recent deep feature reweighting (DFR) technique achieves
state-of-the-art group robustness via simple last-layer retraining, but it
requires held-out group and class annotations to construct a group-balanced
reweighting dataset. In this work, we examine this impractical requirement and
find that last-layer retraining can be surprisingly effective with no group
annotations (other than for model selection) and only a handful of class
annotations. We first show that last-layer retraining can greatly improve
worst-group accuracy even when the reweighting dataset has only a small
proportion of worst-group data. This implies a "free lunch" where holding out a
subset of training data to retrain the last layer can substantially outperform
ERM on the entire dataset with no additional data or annotations. To further
improve group robustness, we introduce a lightweight method called selective
last-layer finetuning (SELF), which constructs the reweighting dataset using
misclassifications or disagreements. Our empirical and theoretical results
present the first evidence that model disagreement upsamples worst-group data,
enabling SELF to nearly match DFR on four well-established benchmarks across
vision and language tasks with no group annotations and less than 3% of the
held-out class annotations. Our code is available at
https://github.com/tmlabonte/last-layer-retraining.Comment: NeurIPS 202
Harmless interpolation of noisy data in regression
A continuing mystery in understanding the empirical success of deep neural
networks has been in their ability to achieve zero training error and yet
generalize well, even when the training data is noisy and there are more
parameters than data points. We investigate this "overparametrization"
phenomena in the classical underdetermined linear regression problem, where all
solutions that minimize training error interpolate the data, including noise.
We give a bound on how well such interpolative solutions can generalize to
fresh test data, and show that this bound generically decays to zero with the
number of extra features, thus characterizing an explicit benefit of
overparameterization. For appropriately sparse linear models, we provide a
hybrid interpolating scheme (combining classical sparse recovery schemes with
harmless noise-fitting) to achieve generalization error close to the bound on
interpolative solutions.Comment: 17 pages, presented at ITA in San Diego in Feb 201
Benign Overfitting in Multiclass Classification: All Roads Lead to Interpolation
The growing literature on "benign overfitting" in overparameterized models
has been mostly restricted to regression or binary classification settings;
however, most success stories of modern machine learning have been recorded in
multiclass settings. Motivated by this discrepancy, we study benign overfitting
in multiclass linear classification. Specifically, we consider the following
popular training algorithms on separable data: (i) empirical risk minimization
(ERM) with cross-entropy loss, which converges to the multiclass support vector
machine (SVM) solution; (ii) ERM with least-squares loss, which converges to
the min-norm interpolating (MNI) solution; and, (iii) the one-vs-all SVM
classifier. First, we provide a simple sufficient condition under which all
three algorithms lead to classifiers that interpolate the training data and
have equal accuracy. When the data is generated from Gaussian mixtures or a
multinomial logistic model, this condition holds under high enough effective
overparameterization. Second, we derive novel error bounds on the accuracy of
the MNI classifier, thereby showing that all three training algorithms lead to
benign overfitting under sufficient overparameterization. Ultimately, our
analysis shows that good generalization is possible for SVM solutions beyond
the realm in which typical margin-based bounds apply
The Complexity of Infinite-Horizon General-Sum Stochastic Games
We study the complexity of computing stationary Nash equilibrium (NE) in
n-player infinite-horizon general-sum stochastic games. We focus on the problem
of computing NE in such stochastic games when each player is restricted to
choosing a stationary policy and rewards are discounted. First, we prove that
computing such NE is in PPAD (in addition to clearly being PPAD-hard). Second,
we consider turn-based specializations of such games where at each state there
is at most a single player that can take actions and show that these
(seemingly-simpler) games remain PPAD-hard. Third, we show that under further
structural assumptions on the rewards computing NE in such turn-based games is
possible in polynomial time. Towards achieving these results we establish
structural facts about stochastic games of broader utility, including
monotonicity of utilities under single-state single-action changes and
reductions to settings where each player controls a single state