18,185 research outputs found
Adversarial Spheres
State of the art computer vision models have been shown to be vulnerable to
small adversarial perturbations of the input. In other words, most images in
the data distribution are both correctly classified by the model and are very
close to a visually similar misclassified image. Despite substantial research
interest, the cause of the phenomenon is still poorly understood and remains
unsolved. We hypothesize that this counter intuitive behavior is a naturally
occurring result of the high dimensional geometry of the data manifold. As a
first step towards exploring this hypothesis, we study a simple synthetic
dataset of classifying between two concentric high dimensional spheres. For
this dataset we show a fundamental tradeoff between the amount of test error
and the average distance to nearest error. In particular, we prove that any
model which misclassifies a small constant fraction of a sphere will be
vulnerable to adversarial perturbations of size . Surprisingly,
when we train several different architectures on this dataset, all of their
error sets naturally approach this theoretical bound. As a result of the
theory, the vulnerability of neural networks to small adversarial perturbations
is a logical consequence of the amount of test error observed. We hope that our
theoretical analysis of this very simple case will point the way forward to
explore how the geometry of complex real-world data sets leads to adversarial
examples
Bayesian Adversarial Spheres: Bayesian Inference and Adversarial Examples in a Noiseless Setting
Modern deep neural network models suffer from adversarial examples, i.e.
confidently misclassified points in the input space. It has been shown that
Bayesian neural networks are a promising approach for detecting adversarial
points, but careful analysis is problematic due to the complexity of these
models. Recently Gilmer et al. (2018) introduced adversarial spheres, a toy
set-up that simplifies both practical and theoretical analysis of the problem.
In this work, we use the adversarial sphere set-up to understand the properties
of approximate Bayesian inference methods for a linear model in a noiseless
setting. We compare predictions of Bayesian and non-Bayesian methods,
showcasing the advantages of the former, although revealing open challenges for
deep learning applications.Comment: To appear in the third workshop on Bayesian Deep Learning (NeurIPS
2018), Montreal, Canad
Sufficient Conditions for Idealised Models to Have No Adversarial Examples: a Theoretical and Empirical Study with Bayesian Neural Networks
We prove, under two sufficient conditions, that idealised models can have no
adversarial examples. We discuss which idealised models satisfy our conditions,
and show that idealised Bayesian neural networks (BNNs) satisfy these. We
continue by studying near-idealised BNNs using HMC inference, demonstrating the
theoretical ideas in practice. We experiment with HMC on synthetic data derived
from MNIST for which we know the ground-truth image density, showing that
near-perfect epistemic uncertainty correlates to density under image manifold,
and that adversarial images lie off the manifold in our setting. This suggests
why MC dropout, which can be seen as performing approximate inference, has been
observed to be an effective defence against adversarial examples in practice;
We highlight failure-cases of non-idealised BNNs relying on dropout, suggesting
a new attack for dropout models and a new defence as well. Lastly, we
demonstrate the defence on a cats-vs-dogs image classification task with a
VGG13 variant
Excessive Invariance Causes Adversarial Vulnerability
Despite their impressive performance, deep neural networks exhibit striking
failures on out-of-distribution inputs. One core idea of adversarial example
research is to reveal neural network errors under such distribution shifts. We
decompose these errors into two complementary sources: sensitivity and
invariance. We show deep networks are not only too sensitive to task-irrelevant
changes of their input, as is well-known from epsilon-adversarial examples, but
are also too invariant to a wide range of task-relevant changes, thus making
vast regions in input space vulnerable to adversarial attacks. We show such
excessive invariance occurs across various tasks and architecture types. On
MNIST and ImageNet one can manipulate the class-specific content of almost any
image without changing the hidden activations. We identify an insufficiency of
the standard cross-entropy loss as a reason for these failures. Further, we
extend this objective based on an information-theoretic analysis so it
encourages the model to consider all task-dependent features in its decision.
This provides the first approach tailored explicitly to overcome excessive
invariance and resulting vulnerabilities
Exploiting Excessive Invariance caused by Norm-Bounded Adversarial Robustness
Adversarial examples are malicious inputs crafted to cause a model to
misclassify them. Their most common instantiation, "perturbation-based"
adversarial examples introduce changes to the input that leave its true label
unchanged, yet result in a different model prediction. Conversely,
"invariance-based" adversarial examples insert changes to the input that leave
the model's prediction unaffected despite the underlying input's label having
changed.
In this paper, we demonstrate that robustness to perturbation-based
adversarial examples is not only insufficient for general robustness, but
worse, it can also increase vulnerability of the model to invariance-based
adversarial examples. In addition to analytical constructions, we empirically
study vision classifiers with state-of-the-art robustness to perturbation-based
adversaries constrained by an norm. We mount attacks that exploit
excessive model invariance in directions relevant to the task, which are able
to find adversarial examples within the ball. In fact, we find that
classifiers trained to be -norm robust are more vulnerable to
invariance-based adversarial examples than their undefended counterparts.
Excessive invariance is not limited to models trained to be robust to
perturbation-based -norm adversaries. In fact, we argue that the term
adversarial example is used to capture a series of model limitations, some of
which may not have been discovered yet. Accordingly, we call for a set of
precise definitions that taxonomize and address each of these shortcomings in
learning.Comment: Accepted at the ICLR 2019 SafeML Worksho
Generalized No Free Lunch Theorem for Adversarial Robustness
This manuscript presents some new impossibility results on adversarial
robustness in machine learning, a very important yet largely open problem. We
show that if conditioned on a class label the data distribution satisfies the
Talagrand transportation-cost inequality (for example, this condition is
satisfied if the conditional distribution has density which is log-concave; is
the uniform measure on a compact Riemannian manifold with positive Ricci
curvature, any classifier can be adversarially fooled with high probability
once the perturbations are slightly greater than the natural noise level in the
problem. We call this result The Strong "No Free Lunch" Theorem as some recent
results (Tsipras et al. 2018, Fawzi et al. 2018, etc.) on the subject can be
immediately recovered as very particular cases. Our theoretical bounds are
demonstrated on both simulated and real data (MNIST). We conclude the
manuscript with some speculation on possible future research directions
A Spectral View of Adversarially Robust Features
Given the apparent difficulty of learning models that are robust to
adversarial perturbations, we propose tackling the simpler problem of
developing adversarially robust features. Specifically, given a dataset and
metric of interest, the goal is to return a function (or multiple functions)
that 1) is robust to adversarial perturbations, and 2) has significant
variation across the datapoints. We establish strong connections between
adversarially robust features and a natural spectral property of the geometry
of the dataset and metric of interest. This connection can be leveraged to
provide both robust features, and a lower bound on the robustness of any
function that has significant variance across the dataset. Finally, we provide
empirical evidence that the adversarially robust features given by this
spectral approach can be fruitfully leveraged to learn a robust (and accurate)
model.Comment: To appear at NIPS 201
DeepLight: Learning Illumination for Unconstrained Mobile Mixed Reality
We present a learning-based method to infer plausible high dynamic range
(HDR), omnidirectional illumination given an unconstrained, low dynamic range
(LDR) image from a mobile phone camera with a limited field of view (FOV). For
training data, we collect videos of various reflective spheres placed within
the camera's FOV, leaving most of the background unoccluded, leveraging that
materials with diverse reflectance functions reveal different lighting cues in
a single exposure. We train a deep neural network to regress from the LDR
background image to HDR lighting by matching the LDR ground truth sphere images
to those rendered with the predicted illumination using image-based relighting,
which is differentiable. Our inference runs at interactive frame rates on a
mobile device, enabling realistic rendering of virtual objects into real scenes
for mobile mixed reality. Training on automatically exposed and white-balanced
videos, we improve the realism of rendered objects compared to the state-of-the
art methods for both indoor and outdoor scenes
Predicting Adversarial Examples with High Confidence
It has been suggested that adversarial examples cause deep learning models to
make incorrect predictions with high confidence. In this work, we take the
opposite stance: an overly confident model is more likely to be vulnerable to
adversarial examples. This work is one of the most proactive approaches taken
to date, as we link robustness with non-calibrated model confidence on noisy
images, providing a data-augmentation-free path forward. The adversarial
examples phenomenon is most easily explained by the trend of increasing
non-regularized model capacity, while the diversity and number of samples in
common datasets has remained flat. Test accuracy has incorrectly been
associated with true generalization performance, ignoring that training and
test splits are often extremely similar in terms of the overall representation
space. The transferability property of adversarial examples was previously used
as evidence against overfitting arguments, a perceived random effect, but
overfitting is not always random.Comment: Under review by the International Conference on Machine Learning
(ICML
On the Geometry of Adversarial Examples
Adversarial examples are a pervasive phenomenon of machine learning models
where seemingly imperceptible perturbations to the input lead to
misclassifications for otherwise statistically accurate models. We propose a
geometric framework, drawing on tools from the manifold reconstruction
literature, to analyze the high-dimensional geometry of adversarial examples.
In particular, we highlight the importance of codimension: for low-dimensional
data manifolds embedded in high-dimensional space there are many directions off
the manifold in which to construct adversarial examples. Adversarial examples
are a natural consequence of learning a decision boundary that classifies the
low-dimensional data manifold well, but classifies points near the manifold
incorrectly. Using our geometric framework we prove (1) a tradeoff between
robustness under different norms, (2) that adversarial training in balls around
the data is sample inefficient, and (3) sufficient sampling conditions under
which nearest neighbor classifiers and ball-based adversarial training are
robust.Comment: Improvements to clarity and presentation over initial submissio
- …