Search CORE

75 research outputs found

Understanding the extreme vulnerability of image classifiers to adversarial examples

Author: Tanay Thomas
Publication venue: UCL (University College London)
Publication date: 28/10/2020
Field of study

State-of-the-art deep networks for image classification are vulnerable to adversarial examples—misclassified images which are obtained by applying imperceptible non-random perturbations to correctly classified test images. This vulnerability is somewhat paradoxical: how can these models perform so well, if they are so sensitive to small perturbations of their inputs? Two early but influential explanations focused on the high non-linearity of deep networks, and on the high-dimensionality of image space. We review these explanations and highlight their limitations, before introducing a new perspective according to which adversarial examples exist when the classification boundary lies close to the manifold of normal data. We present a detailed mathematical analysis of the new perspective in binary linear classification, where the adversarial vulnerability of a classifier can be reduced to the deviation angle between its weight vector and the weight vector of the nearest centroid classifier. This analysis leads us to identify two types of adversarial examples: those affecting optimal classifiers, which are limited by a fundamental robustness/accuracy trade-off, and those affecting sub-optimal classifiers, resulting from imperfect training procedures or overfitting. We then show that L2 regularization plays an important role in practice, by acting as a balancing mechanism between two objectives: the minimization of the error and the maximization of the adversarial distance over the training set. We finally generalize our considerations to deep neural networks, reinterpreting in particular weight decay and adversarial training as belonging to a same family of output regularizers. If designing models that are robust to small image perturbations remains challenging, we show in the last Chapter of this thesis that state-of-the-art networks can easily be made more vulnerable. Reversing the problem in this way exposes new attack scenarios and, crucially, helps improve our understanding of the adversarial example phenomenon by emphasizing the role played by low variance directions

UCL Discovery