3,831 research outputs found
Benign Overfitting in Linear Regression
The phenomenon of benign overfitting is one of the key mysteries uncovered by
deep learning methodology: deep neural networks seem to predict well, even with
a perfect fit to noisy training data. Motivated by this phenomenon, we consider
when a perfect fit to training data in linear regression is compatible with
accurate prediction. We give a characterization of linear regression problems
for which the minimum norm interpolating prediction rule has near-optimal
prediction accuracy. The characterization is in terms of two notions of the
effective rank of the data covariance. It shows that overparameterization is
essential for benign overfitting in this setting: the number of directions in
parameter space that are unimportant for prediction must significantly exceed
the sample size. By studying examples of data covariance properties that this
characterization shows are required for benign overfitting, we find an
important role for finite-dimensional data: the accuracy of the minimum norm
interpolating prediction rule approaches the best possible accuracy for a much
narrower range of properties of the data distribution when the data lies in an
infinite dimensional space versus when the data lies in a finite dimensional
space whose dimension grows faster than the sample size
Benign Overfitting in Multiclass Classification: All Roads Lead to Interpolation
The growing literature on "benign overfitting" in overparameterized models
has been mostly restricted to regression or binary classification settings;
however, most success stories of modern machine learning have been recorded in
multiclass settings. Motivated by this discrepancy, we study benign overfitting
in multiclass linear classification. Specifically, we consider the following
popular training algorithms on separable data: (i) empirical risk minimization
(ERM) with cross-entropy loss, which converges to the multiclass support vector
machine (SVM) solution; (ii) ERM with least-squares loss, which converges to
the min-norm interpolating (MNI) solution; and, (iii) the one-vs-all SVM
classifier. First, we provide a simple sufficient condition under which all
three algorithms lead to classifiers that interpolate the training data and
have equal accuracy. When the data is generated from Gaussian mixtures or a
multinomial logistic model, this condition holds under high enough effective
overparameterization. Second, we derive novel error bounds on the accuracy of
the MNI classifier, thereby showing that all three training algorithms lead to
benign overfitting under sufficient overparameterization. Ultimately, our
analysis shows that good generalization is possible for SVM solutions beyond
the realm in which typical margin-based bounds apply
Theoretical Characterization of the Generalization Performance of Overfitted Meta-Learning
Meta-learning has arisen as a successful method for improving training
performance by training over many similar tasks, especially with deep neural
networks (DNNs). However, the theoretical understanding of when and why
overparameterized models such as DNNs can generalize well in meta-learning is
still limited. As an initial step towards addressing this challenge, this paper
studies the generalization performance of overfitted meta-learning under a
linear regression model with Gaussian features. In contrast to a few recent
studies along the same line, our framework allows the number of model
parameters to be arbitrarily larger than the number of features in the ground
truth signal, and hence naturally captures the overparameterized regime in
practical deep meta-learning. We show that the overfitted min -norm
solution of model-agnostic meta-learning (MAML) can be beneficial, which is
similar to the recent remarkable findings on ``benign overfitting'' and
``double descent'' phenomenon in the classical (single-task) linear regression.
However, due to the uniqueness of meta-learning such as task-specific gradient
descent inner training and the diversity/fluctuation of the ground-truth
signals among training tasks, we find new and interesting properties that do
not exist in single-task linear regression. We first provide a high-probability
upper bound (under reasonable tightness) on the generalization error, where
certain terms decrease when the number of features increases. Our analysis
suggests that benign overfitting is more significant and easier to observe when
the noise and the diversity/fluctuation of the ground truth of each training
task are large. Under this circumstance, we show that the overfitted min
-norm solution can achieve an even lower generalization error than the
underparameterized solution
Feature and Variable Selection in Classification
The amount of information in the form of features and variables avail- able
to machine learning algorithms is ever increasing. This can lead to classifiers
that are prone to overfitting in high dimensions, high di- mensional models do
not lend themselves to interpretable results, and the CPU and memory resources
necessary to run on high-dimensional datasets severly limit the applications of
the approaches. Variable and feature selection aim to remedy this by finding a
subset of features that in some way captures the information provided best. In
this paper we present the general methodology and highlight some specific
approaches.Comment: Part of master seminar in document analysis held by Marcus
Eichenberger-Liwick
Benign Overfitting in Classification: Provably Counter Label Noise with Larger Models
Studies on benign overfitting provide insights for the success of
overparameterized deep learning models. In this work, we examine whether
overfitting is truly benign in real-world classification tasks. We start with
the observation that a ResNet model overfits benignly on Cifar10 but not
benignly on ImageNet. To understand why benign overfitting fails in the
ImageNet experiment, we theoretically analyze benign overfitting under a more
restrictive setup where the number of parameters is not significantly larger
than the number of data points. Under this mild overparameterization setup, our
analysis identifies a phase change: unlike in the previous heavy
overparameterization settings, benign overfitting can now fail in the presence
of label noise. Our analysis explains our empirical observations, and is
validated by a set of control experiments with ResNets. Our work highlights the
importance of understanding implicit bias in underfitting regimes as a future
direction.Comment: Published as a conference paper at ICLR 202
Skin lesion classification from dermoscopic images using deep learning techniques
The recent emergence of deep learning methods for medical image analysis has enabled the development of intelligent medical imaging-based diagnosis systems that can assist the human expert in making better decisions about a patient’s health. In this paper we focus on the problem of skin lesion classification, particularly early melanoma detection, and present a deep-learning based approach to solve the problem of classifying a dermoscopic image containing a skin lesion as malignant or benign. The proposed solution is built around the VGGNet convolutional neural network architecture and uses the transfer learning paradigm. Experimental results are encouraging: on the ISIC Archive dataset, the proposed method achieves a sensitivity value of 78.66%, which is significantly higher than the current state of the art on that dataset.Postprint (author's final draft
- …