71 research outputs found
Spectrally-normalized margin bounds for neural networks
This paper presents a margin-based multiclass generalization bound for neural
networks that scales with their margin-normalized "spectral complexity": their
Lipschitz constant, meaning the product of the spectral norms of the weight
matrices, times a certain correction factor. This bound is empirically
investigated for a standard AlexNet network trained with SGD on the mnist and
cifar10 datasets, with both original and random labels; the bound, the
Lipschitz constants, and the excess risks are all in direct correlation,
suggesting both that SGD selects predictors whose complexity scales with the
difficulty of the learning task, and secondly that the presented bound is
sensitive to this complexity.Comment: Comparison to arXiv v1: 1-norm in main bound refined to
(2,1)-group-norm. Comparison to NIPS camera ready: typo fixe
Orthogonal Statistical Learning
We provide non-asymptotic excess risk guarantees for statistical learning in
a setting where the population risk with respect to which we evaluate the
target parameter depends on an unknown nuisance parameter that must be
estimated from data. We analyze a two-stage sample splitting meta-algorithm
that takes as input two arbitrary estimation algorithms: one for the target
parameter and one for the nuisance parameter. We show that if the population
risk satisfies a condition called Neyman orthogonality, the impact of the
nuisance estimation error on the excess risk bound achieved by the
meta-algorithm is of second order. Our theorem is agnostic to the particular
algorithms used for the target and nuisance and only makes an assumption on
their individual performance. This enables the use of a plethora of existing
results from statistical learning and machine learning to give new guarantees
for learning with a nuisance component. Moreover, by focusing on excess risk
rather than parameter estimation, we can give guarantees under weaker
assumptions than in previous works and accommodate settings in which the target
parameter belongs to a complex nonparametric class. We provide conditions on
the metric entropy of the nuisance and target classes such that oracle
rates---rates of the same order as if we knew the nuisance parameter---are
achieved. We also derive new rates for specific estimation algorithms such as
variance-penalized empirical risk minimization, neural network estimation and
sparse high-dimensional linear model estimation. We highlight the applicability
of our results in four settings of central importance: 1) heterogeneous
treatment effect estimation, 2) offline policy optimization, 3) domain
adaptation, and 4) learning with missing data
- …