15 research outputs found
Local Convolutions Cause an Implicit Bias towards High Frequency Adversarial Examples
Adversarial Attacks are still a significant challenge for neural networks.
Recent work has shown that adversarial perturbations typically contain
high-frequency features, but the root cause of this phenomenon remains unknown.
Inspired by theoretical work on linear full-width convolutional models, we
hypothesize that the local (i.e. bounded-width) convolutional operations
commonly used in current neural networks are implicitly biased to learn high
frequency features, and that this is one of the root causes of high frequency
adversarial examples. To test this hypothesis, we analyzed the impact of
different choices of linear and nonlinear architectures on the implicit bias of
the learned features and the adversarial perturbations, in both spatial and
frequency domains. We find that the high-frequency adversarial perturbations
are critically dependent on the convolution operation because the
spatially-limited nature of local convolutions induces an implicit bias towards
high frequency features. The explanation for the latter involves the Fourier
Uncertainty Principle: a spatially-limited (local in the space domain) filter
cannot also be frequency-limited (local in the frequency domain). Furthermore,
using larger convolution kernel sizes or avoiding convolutions (e.g. by using
Vision Transformers architecture) significantly reduces this high frequency
bias, but not the overall susceptibility to attacks. Looking forward, our work
strongly suggests that understanding and controlling the implicit bias of
architectures will be essential for achieving adversarial robustness.Comment: 20 pages, 11 figures, 12 Table
An analytic theory of shallow networks dynamics for hinge loss classification
Neural networks have been shown to perform incredibly well in classification
tasks over structured high-dimensional datasets. However, the learning dynamics
of such networks is still poorly understood. In this paper we study in detail
the training dynamics of a simple type of neural network: a single hidden layer
trained to perform a classification task. We show that in a suitable mean-field
limit this case maps to a single-node learning problem with a time-dependent
dataset determined self-consistently from the average nodes population. We
specialize our theory to the prototypical case of a linearly separable dataset
and a linear hinge loss, for which the dynamics can be explicitly solved. This
allow us to address in a simple setting several phenomena appearing in modern
networks such as slowing down of training dynamics, crossover between rich and
lazy learning, and overfitting. Finally, we asses the limitations of mean-field
theory by studying the case of large but finite number of nodes and of training
samples.Comment: 16 pages, 6 figure