2 research outputs found
Improved Sample Complexities for Deep Networks and Robust Classification via an All-Layer Margin
For linear classifiers, the relationship between (normalized) output margin
and generalization is captured in a clear and simple bound -- a large output
margin implies good generalization. Unfortunately, for deep models, this
relationship is less clear: existing analyses of the output margin give
complicated bounds which sometimes depend exponentially on depth. In this work,
we propose to instead analyze a new notion of margin, which we call the
"all-layer margin." Our analysis reveals that the all-layer margin has a clear
and direct relationship with generalization for deep models. This enables the
following concrete applications of the all-layer margin: 1) by analyzing the
all-layer margin, we obtain tighter generalization bounds for neural nets which
depend on Jacobian and hidden layer norms and remove the exponential dependency
on depth 2) our neural net results easily translate to the adversarially robust
setting, giving the first direct analysis of robust test error for deep
networks, and 3) we present a theoretically inspired training algorithm for
increasing the all-layer margin. Our algorithm improves both clean and
adversarially robust test performance over strong baselines in practice.Comment: Code for all-layer margin optimization is available at the following
link: https://github.com/cwein3/all-layer-margin-opt. Version 4: Re-organized
proofs for more clarit
A Group-Theoretic Framework for Data Augmentation
Data augmentation is a widely used trick when training deep neural networks:
in addition to the original data, properly transformed data are also added to
the training set. However, to the best of our knowledge, a clear mathematical
framework to explain the performance benefits of data augmentation is not
available. In this paper, we develop such a theoretical framework. We show data
augmentation is equivalent to an averaging operation over the orbits of a
certain group that keeps the data distribution approximately invariant. We
prove that it leads to variance reduction. We study empirical risk
minimization, and the examples of exponential families, linear regression, and
certain two-layer neural networks. We also discuss how data augmentation could
be used in problems with symmetry where other approaches are prevalent, such as
in cryo-electron microscopy (cryo-EM).Comment: To appear in Journal of Machine Learning Researc