CORE
πΊπ¦Β
Β make metadata, not war
Services
Services overview
Explore all CORE services
Access to raw data
API
Dataset
FastSync
Content discovery
Recommender
Discovery
OAI identifiers
OAI Resolver
Managing content
Dashboard
Bespoke contracts
Consultancy services
Support us
Support us
Membership
Sponsorship
Community governance
Advisory Board
Board of supporters
Research network
About
About us
Our mission
Team
Blog
FAQs
Contact us
The generalization error of max-margin linear classifiers: High-dimensional asymptotics in the overparametrized regime
Authors
Andrea Montanari
Feng Ruan
Youngtak Sohn
Jun Yan
Publication date
17 July 2020
Publisher
View
on
arXiv
Abstract
Modern machine learning models are often so complex that they achieve vanishing classification error on the training set. Max-margin linear classifiers are among the simplest classification methods that have zero training error (with linearly separable data). Despite their simplicity, their high-dimensional behavior is not yet completely understood. We assume to be given i.i.d. data
(
y
i
,
x
i
)
(y_i,{\boldsymbol x}_i)
(
y
i
β
,
x
i
β
)
,
i
β€
n
i\le n
i
β€
n
with
x
i
βΌ
N
(
0
,
Ξ£
)
{\boldsymbol x}_i\sim {\sf N}(0,{\boldsymbol \Sigma})
x
i
β
βΌ
N
(
0
,
Ξ£
)
a
p
p
p
-dimensional feature vector, and
y
i
β
{
+
1
,
β
1
}
y_i \in\{+1,-1\}
y
i
β
β
{
+
1
,
β
1
}
a label whose distribution depends on a linear combination of the covariates
β¨
ΞΈ
β
,
x
i
β©
\langle{\boldsymbol\theta}_*,{\boldsymbol x}_i\rangle
β¨
ΞΈ
β
β
,
x
i
β
β©
. We consider the proportional asymptotics
n
,
p
β
β
n,p\to\infty
n
,
p
β
β
with
p
/
n
β
Ο
p/n\to \psi
p
/
n
β
Ο
, and derive exact expressions for the limiting prediction error. Our asymptotic results match simulations already when
n
,
p
n,p
n
,
p
are of the order of a few hundreds. We explore several choices for
(
ΞΈ
β
,
Ξ£
)
({\boldsymbol \theta}_*,{\boldsymbol \Sigma})
(
ΞΈ
β
β
,
Ξ£
)
, and show that the resulting generalization curve (test error error as a function of the overparametrization
Ο
=
p
/
n
\psi=p/n
Ο
=
p
/
n
) is qualitatively different, depending on this choice. In particular we consider a specific structure of
(
ΞΈ
β
,
Ξ£
)
({\boldsymbol \theta}_*,{\boldsymbol\Sigma})
(
ΞΈ
β
β
,
Ξ£
)
that captures the behavior of nonlinear random feature models or, equivalently, two-layers neural networks with random first layer weights. In this case, we aim at classifying data
(
y
i
,
x
i
)
(y_i,{\boldsymbol x}_i)
(
y
i
β
,
x
i
β
)
with
x
i
β
R
d
{\boldsymbol x}_i\in{\mathbb R}^d
x
i
β
β
R
d
but we do so by first embedding them a
p
p
p
dimensional feature space via
x
i
β¦
Ο
(
W
x
i
)
{\boldsymbol x}_i\mapsto\sigma({\boldsymbol W}{\boldsymbol x}_i)
x
i
β
β¦
Ο
(
W
x
i
β
)
and then finding a max-margin classifier in this space. We derive exact formulas in the proportional asymptotics
p
,
n
,
d
β
β
p,n,d\to\infty
p
,
n
,
d
β
β
with
p
/
d
β
Ο
1
p/d\to\psi_1
p
/
d
β
Ο
1
β
,
n
/
d
β
Ο
2
n/d\to\psi_2
n
/
d
β
Ο
2
β
and observe that the test error is minimized in the highly overparametrized regime
Ο
1
β«
0
\psi_1\gg 0
Ο
1
β
β«
0
.Comment: 73 pages; 12 pdf figures (Added formulas for wide asymptotics, and distribution of the coordinates of the estimator
Similar works
Full text
Available Versions
arXiv.org e-Print Archive
See this paper in CORE
Go to the repository landing page
Download from data provider
oai:arXiv.org:1911.01544
Last time updated on 22/07/2020