481 research outputs found
Provable defenses against adversarial examples via the convex outer adversarial polytope
We propose a method to learn deep ReLU-based classifiers that are provably
robust against norm-bounded adversarial perturbations on the training data. For
previously unseen examples, the approach is guaranteed to detect all
adversarial examples, though it may flag some non-adversarial examples as well.
The basic idea is to consider a convex outer approximation of the set of
activations reachable through a norm-bounded perturbation, and we develop a
robust optimization procedure that minimizes the worst case loss over this
outer region (via a linear program). Crucially, we show that the dual problem
to this linear program can be represented itself as a deep network similar to
the backpropagation network, leading to very efficient optimization approaches
that produce guaranteed bounds on the robust loss. The end result is that by
executing a few more forward and backward passes through a slightly modified
version of the original network (though possibly with much larger batch sizes),
we can learn a classifier that is provably robust to any norm-bounded
adversarial attack. We illustrate the approach on a number of tasks to train
classifiers with robust adversarial guarantees (e.g. for MNIST, we produce a
convolutional classifier that provably has less than 5.8% test error for any
adversarial attack with bounded norm less than ),
and code for all experiments in the paper is available at
https://github.com/locuslab/convex_adversarial.Comment: ICML final versio
Better the Devil you Know: An Analysis of Evasion Attacks using Out-of-Distribution Adversarial Examples
A large body of recent work has investigated the phenomenon of evasion
attacks using adversarial examples for deep learning systems, where the
addition of norm-bounded perturbations to the test inputs leads to incorrect
output classification. Previous work has investigated this phenomenon in
closed-world systems where training and test inputs follow a pre-specified
distribution. However, real-world implementations of deep learning
applications, such as autonomous driving and content classification are likely
to operate in the open-world environment. In this paper, we demonstrate the
success of open-world evasion attacks, where adversarial examples are generated
from out-of-distribution inputs (OOD adversarial examples). In our study, we
use 11 state-of-the-art neural network models trained on 3 image datasets of
varying complexity. We first demonstrate that state-of-the-art detectors for
out-of-distribution data are not robust against OOD adversarial examples. We
then consider 5 known defenses for adversarial examples, including
state-of-the-art robust training methods, and show that against these defenses,
OOD adversarial examples can achieve up to 4 higher target success
rates compared to adversarial examples generated from in-distribution data. We
also take a quantitative look at how open-world evasion attacks may affect
real-world systems. Finally, we present the first steps towards a robust
open-world machine learning system.Comment: 18 pages, 5 figures, 9 table
Provable Certificates for Adversarial Examples: Fitting a Ball in the Union of Polytopes
We propose a novel method for computing exact pointwise robustness of deep
neural networks for all convex norms. Our algorithm, GeoCert, finds
the largest ball centered at an input point , within which the
output class of a given neural network with ReLU nonlinearities remains
unchanged. We relate the problem of computing pointwise robustness of these
networks to that of computing the maximum norm ball with a fixed center that
can be contained in a non-convex polytope. This is a challenging problem in
general, however we show that there exists an efficient algorithm to compute
this for polyhedral complices. Further we show that piecewise linear neural
networks partition the input space into a polyhedral complex. Our algorithm has
the ability to almost immediately output a nontrivial lower bound to the
pointwise robustness which is iteratively improved until it ultimately becomes
tight. We empirically show that our approach generates distance lower bounds
that are tighter compared to prior work, under moderate time constraints.Comment: Code can be found here:
https://github.com/revbucket/geometric-certificate
The LogBarrier adversarial attack: making effective use of decision boundary information
Adversarial attacks for image classification are small perturbations to
images that are designed to cause misclassification by a model. Adversarial
attacks formally correspond to an optimization problem: find a minimum norm
image perturbation, constrained to cause misclassification. A number of
effective attacks have been developed. However, to date, no gradient-based
attacks have used best practices from the optimization literature to solve this
constrained minimization problem. We design a new untargeted attack, based on
these best practices, using the established logarithmic barrier method. On
average, our attack distance is similar or better than all state-of-the-art
attacks on benchmark datasets (MNIST, CIFAR10, ImageNet-1K). In addition, our
method performs significantly better on the most challenging images, those
which normally require larger perturbations for misclassification. We employ
the LogBarrier attack on several adversarially defended models, and show that
it adversarially perturbs all images more efficiently than other attacks: the
distance needed to perturb all images is significantly smaller with the
LogBarrier attack than with other state-of-the-art attacks.Comment: 12 pages, 4 figures, 6 table
Semidefinite relaxations for certifying robustness to adversarial examples
Despite their impressive performance on diverse tasks, neural networks fail
catastrophically in the presence of adversarial inputs---imperceptibly but
adversarially perturbed versions of natural inputs. We have witnessed an arms
race between defenders who attempt to train robust networks and attackers who
try to construct adversarial examples. One promise of ending the arms race is
developing certified defenses, ones which are provably robust against all
attackers in some family. These certified defenses are based on convex
relaxations which construct an upper bound on the worst case loss over all
attackers in the family. Previous relaxations are loose on networks that are
not trained against the respective relaxation. In this paper, we propose a new
semidefinite relaxation for certifying robustness that applies to arbitrary
ReLU networks. We show that our proposed relaxation is tighter than previous
relaxations and produces meaningful robustness guarantees on three different
"foreign networks" whose training objectives are agnostic to our proposed
relaxation.Comment: To appear at NIPS 201
Cost-Sensitive Robustness against Adversarial Examples
Several recent works have developed methods for training classifiers that are
certifiably robust against norm-bounded adversarial perturbations. These
methods assume that all the adversarial transformations are equally important,
which is seldom the case in real-world applications. We advocate for
cost-sensitive robustness as the criteria for measuring the classifier's
performance for tasks where some adversarial transformation are more important
than others. We encode the potential harm of each adversarial transformation in
a cost matrix, and propose a general objective function to adapt the robust
training method of Wong & Kolter (2018) to optimize for cost-sensitive
robustness. Our experiments on simple MNIST and CIFAR10 models with a variety
of cost matrices show that the proposed approach can produce models with
substantially reduced cost-sensitive robust error, while maintaining
classification accuracy.Comment: ICLR final versio
Universal Lipschitz Approximation in Bounded Depth Neural Networks
Adversarial attacks against machine learning models are a rather hefty
obstacle to our increasing reliance on these models. Due to this, provably
robust (certified) machine learning models are a major topic of interest.
Lipschitz continuous models present a promising approach to solving this
problem. By leveraging the expressive power of a variant of neural networks
which maintain low Lipschitz constants, we prove that three layer neural
networks using the FullSort activation function are Universal Lipschitz
function Approximators (ULAs). This both explains experimental results and
paves the way for the creation of better certified models going forward. We
conclude by presenting experimental results that suggest that ULAs are a not
just a novelty, but a competitive approach to providing certified classifiers,
using these results to motivate several potential topics of further research
Wasserstein Adversarial Examples via Projected Sinkhorn Iterations
A rapidly growing area of work has studied the existence of adversarial
examples, datapoints which have been perturbed to fool a classifier, but the
vast majority of these works have focused primarily on threat models defined by
norm-bounded perturbations. In this paper, we propose a new threat
model for adversarial attacks based on the Wasserstein distance. In the image
classification setting, such distances measure the cost of moving pixel mass,
which naturally cover "standard" image manipulations such as scaling, rotation,
translation, and distortion (and can potentially be applied to other settings
as well). To generate Wasserstein adversarial examples, we develop a procedure
for projecting onto the Wasserstein ball, based upon a modified version of the
Sinkhorn iteration. The resulting algorithm can successfully attack image
classification models, bringing traditional CIFAR10 models down to 3% accuracy
within a Wasserstein ball with radius 0.1 (i.e., moving 10% of the image mass 1
pixel), and we demonstrate that PGD-based adversarial training can improve this
adversarial accuracy to 76%. In total, this work opens up a new direction of
study in adversarial robustness, more formally considering convex metrics that
accurately capture the invariances that we typically believe should exist in
classifiers. Code for all experiments in the paper is available at
https://github.com/locuslab/projected_sinkhorn
Towards Stable and Efficient Training of Verifiably Robust Neural Networks
Training neural networks with verifiable robustness guarantees is
challenging. Several existing approaches utilize linear relaxation based neural
network output bounds under perturbation, but they can slow down training by a
factor of hundreds depending on the underlying network architectures.
Meanwhile, interval bound propagation (IBP) based training is efficient and
significantly outperforms linear relaxation based methods on many tasks, yet it
may suffer from stability issues since the bounds are much looser especially at
the beginning of training. In this paper, we propose a new certified
adversarial training method, CROWN-IBP, by combining the fast IBP bounds in a
forward bounding pass and a tight linear relaxation based bound, CROWN, in a
backward bounding pass. CROWN-IBP is computationally efficient and consistently
outperforms IBP baselines on training verifiably robust neural networks. We
conduct large scale experiments on MNIST and CIFAR datasets, and outperform all
previous linear relaxation and bound propagation based certified defenses in
robustness. Notably, we achieve 7.02% verified test error on
MNIST at , and 66.94% on CIFAR-10 with . Code is
available at https://github.com/deepmind/interval-bound-propagation
(TensorFlow) and https://github.com/huanzhang12/CROWN-IBP (PyTorch)
Encryption Inspired Adversarial Defense for Visual Classification
Conventional adversarial defenses reduce classification accuracy whether or
not a model is under attacks. Moreover, most of image processing based defenses
are defeated due to the problem of obfuscated gradients. In this paper, we
propose a new adversarial defense which is a defensive transform for both
training and test images inspired by perceptual image encryption methods. The
proposed method utilizes a block-wise pixel shuffling method with a secret key.
The experiments are carried out on both adaptive and non-adaptive maximum-norm
bounded white-box attacks while considering obfuscated gradients. The results
show that the proposed defense achieves high accuracy (91.55 %) on clean images
and (89.66 %) on adversarial examples with noise distance of 8/255 on CIFAR-10
dataset. Thus, the proposed defense outperforms state-of-the-art adversarial
defenses including latent adversarial training, adversarial training and
thermometer encoding.Comment: To be appeared on 27th IEEE International Conference on Image
Processing (ICIP 2020
- β¦