32 research outputs found
Global Robustness Evaluation of Deep Neural Networks with Provable Guarantees for the Norm
Deployment of deep neural networks (DNNs) in safety- or security-critical
systems requires provable guarantees on their correct behaviour. A common
requirement is robustness to adversarial perturbations in a neighbourhood
around an input. In this paper we focus on the norm and aim to compute,
for a trained DNN and an input, the maximal radius of a safe norm ball around
the input within which there are no adversarial examples. Then we define global
robustness as an expectation of the maximal safe radius over a test data set.
We first show that the problem is NP-hard, and then propose an approximate
approach to iteratively compute lower and upper bounds on the network's
robustness. The approach is \emph{anytime}, i.e., it returns intermediate
bounds and robustness estimates that are gradually, but strictly, improved as
the computation proceeds; \emph{tensor-based}, i.e., the computation is
conducted over a set of inputs simultaneously, instead of one by one, to enable
efficient GPU computation; and has \emph{provable guarantees}, i.e., both the
bounds and the robustness estimates can converge to their optimal values.
Finally, we demonstrate the utility of the proposed approach in practice to
compute tight bounds by applying and adapting the anytime algorithm to a set of
challenging problems, including global robustness evaluation, competitive
attacks, test case generation for DNNs, and local robustness evaluation on
large-scale ImageNet DNNs. We release the code of all case studies via GitHub.Comment: 42 Pages, Github: https://github.com/TrustAI/L0-TR
Formal methods and software engineering for DL. Security, safety and productivity for DL systems development
Deep Learning (DL) techniques are now widespread and being integrated into
many important systems. Their classification and recognition abilities ensure
their relevance for multiple application domains. As machine-learning that
relies on training instead of algorithm programming, they offer a high degree
of productivity. But they can be vulnerable to attacks and the verification of
their correctness is only just emerging as a scientific and engineering
possibility. This paper is a major update of a previously-published survey,
attempting to cover all recent publications in this area. It also covers an
even more recent trend, namely the design of domain-specific languages for
producing and training neural nets.Comment: Submitted to IEEE-CCECE201
Learning to Defense by Learning to Attack
Adversarial training is a principled approach for training robust neural
networks. From an optimization perspective, adversarial training is solving a
bilevel optimization problem (a general form of minimax approaches): The leader
problem targets on learning a robust classifier; The follower problem tries to
generate adversarial samples. Unfortunately, such a bilevel problem is very
challenging to solve due to its highly complicated structure. This work
proposes a new adversarial training method based on a generic learning-to-learn
(L2L) framework. Specifically, instead of applying hand-designed algorithms for
the follower problem, we learn an optimizer, which is parametrized by a
convolutional neural network. Meanwhile, a robust classifier is learned to
defense the adversarial attacks generated by the learned optimizer. Our
experiments over CIFAR datasets demonstrate that L2L improves upon existing
methods in both robust accuracy and computational efficiency. Moreover, the L2L
framework can be extended to other popular bilevel problems in machine
learning
Semidefinite relaxations for certifying robustness to adversarial examples
Despite their impressive performance on diverse tasks, neural networks fail
catastrophically in the presence of adversarial inputs---imperceptibly but
adversarially perturbed versions of natural inputs. We have witnessed an arms
race between defenders who attempt to train robust networks and attackers who
try to construct adversarial examples. One promise of ending the arms race is
developing certified defenses, ones which are provably robust against all
attackers in some family. These certified defenses are based on convex
relaxations which construct an upper bound on the worst case loss over all
attackers in the family. Previous relaxations are loose on networks that are
not trained against the respective relaxation. In this paper, we propose a new
semidefinite relaxation for certifying robustness that applies to arbitrary
ReLU networks. We show that our proposed relaxation is tighter than previous
relaxations and produces meaningful robustness guarantees on three different
"foreign networks" whose training objectives are agnostic to our proposed
relaxation.Comment: To appear at NIPS 201
DeepSafe: A Data-driven Approach for Checking Adversarial Robustness in Neural Networks
Deep neural networks have become widely used, obtaining remarkable results in
domains such as computer vision, speech recognition, natural language
processing, audio recognition, social network filtering, machine translation,
and bio-informatics, where they have produced results comparable to human
experts. However, these networks can be easily fooled by adversarial
perturbations: minimal changes to correctly-classified inputs, that cause the
network to mis-classify them. This phenomenon represents a concern for both
safety and security, but it is currently unclear how to measure a network's
robustness against such perturbations. Existing techniques are limited to
checking robustness around a few individual input points, providing only very
limited guarantees. We propose a novel approach for automatically identifying
safe regions of the input space, within which the network is robust against
adversarial perturbations. The approach is data-guided, relying on clustering
to identify well-defined geometric regions as candidate safe regions. We then
utilize verification techniques to confirm that these regions are safe or to
provide counter-examples showing that they are not safe. We also introduce the
notion of targeted robustness which, for a given target label and region,
ensures that a NN does not map any input in the region to the target label. We
evaluated our technique on the MNIST dataset and on a neural network
implementation of a controller for the next-generation Airborne Collision
Avoidance System for unmanned aircraft (ACAS Xu). For these networks, our
approach identified multiple regions which were completely safe as well as some
which were only safe for specific labels. It also discovered several
adversarial perturbations of interest
Benchmarking Neural Network Robustness to Common Corruptions and Perturbations
In this paper we establish rigorous benchmarks for image classifier
robustness. Our first benchmark, ImageNet-C, standardizes and expands the
corruption robustness topic, while showing which classifiers are preferable in
safety-critical applications. Then we propose a new dataset called ImageNet-P
which enables researchers to benchmark a classifier's robustness to common
perturbations. Unlike recent robustness research, this benchmark evaluates
performance on common corruptions and perturbations not worst-case adversarial
perturbations. We find that there are negligible changes in relative corruption
robustness from AlexNet classifiers to ResNet classifiers. Afterward we
discover ways to enhance corruption and perturbation robustness. We even find
that a bypassed adversarial defense provides substantial common perturbation
robustness. Together our benchmarks may aid future work toward networks that
robustly generalize.Comment: ICLR 2019 camera-ready; datasets available at
https://github.com/hendrycks/robustness ; this article supersedes
arXiv:1807.0169
Identify Susceptible Locations in Medical Records via Adversarial Attacks on Deep Predictive Models
The surging availability of electronic medical records (EHR) leads to
increased research interests in medical predictive modeling. Recently many deep
learning based predicted models are also developed for EHR data and
demonstrated impressive performance. However, a series of recent studies showed
that these deep models are not safe: they suffer from certain vulnerabilities.
In short, a well-trained deep network can be extremely sensitive to inputs with
negligible changes. These inputs are referred to as adversarial examples. In
the context of medical informatics, such attacks could alter the result of a
high performance deep predictive model by slightly perturbing a patient's
medical records. Such instability not only reflects the weakness of deep
architectures, more importantly, it offers guide on detecting susceptible parts
on the inputs. In this paper, we propose an efficient and effective framework
that learns a time-preferential minimum attack targeting the LSTM model with
EHR inputs, and we leverage this attack strategy to screen medical records of
patients and identify susceptible events and measurements. The efficient
screening procedure can assist decision makers to pay extra attentions to the
locations that can cause severe consequence if not measured correctly. We
conduct extensive empirical studies on a real-world urgent care cohort and
demonstrate the effectiveness of the proposed screening approach
Training for Faster Adversarial Robustness Verification via Inducing ReLU Stability
We explore the concept of co-design in the context of neural network
verification. Specifically, we aim to train deep neural networks that not only
are robust to adversarial perturbations but also whose robustness can be
verified more easily. To this end, we identify two properties of network models
- weight sparsity and so-called ReLU stability - that turn out to significantly
impact the complexity of the corresponding verification task. We demonstrate
that improving weight sparsity alone already enables us to turn computationally
intractable verification problems into tractable ones. Then, improving ReLU
stability leads to an additional 4-13x speedup in verification times. An
important feature of our methodology is its "universality," in the sense that
it can be used with a broad range of training procedures and verification
approaches
Evaluating Robustness of Neural Networks with Mixed Integer Programming
Neural networks have demonstrated considerable success on a wide variety of
real-world problems. However, networks trained only to optimize for training
accuracy can often be fooled by adversarial examples - slightly perturbed
inputs that are misclassified with high confidence. Verification of networks
enables us to gauge their vulnerability to such adversarial examples. We
formulate verification of piecewise-linear neural networks as a mixed integer
program. On a representative task of finding minimum adversarial distortions,
our verifier is two to three orders of magnitude quicker than the
state-of-the-art. We achieve this computational speedup via tight formulations
for non-linearities, as well as a novel presolve algorithm that makes full use
of all information available. The computational speedup allows us to verify
properties on convolutional networks with an order of magnitude more ReLUs than
networks previously verified by any complete verifier. In particular, we
determine for the first time the exact adversarial accuracy of an MNIST
classifier to perturbations with bounded norm : for
this classifier, we find an adversarial example for 4.38% of samples, and a
certificate of robustness (to perturbations with bounded norm) for the
remainder. Across all robust training procedures and network architectures
considered, we are able to certify more samples than the state-of-the-art and
find more adversarial examples than a strong first-order attack.Comment: Accepted as a conference paper at ICLR 201
Provable defenses against adversarial examples via the convex outer adversarial polytope
We propose a method to learn deep ReLU-based classifiers that are provably
robust against norm-bounded adversarial perturbations on the training data. For
previously unseen examples, the approach is guaranteed to detect all
adversarial examples, though it may flag some non-adversarial examples as well.
The basic idea is to consider a convex outer approximation of the set of
activations reachable through a norm-bounded perturbation, and we develop a
robust optimization procedure that minimizes the worst case loss over this
outer region (via a linear program). Crucially, we show that the dual problem
to this linear program can be represented itself as a deep network similar to
the backpropagation network, leading to very efficient optimization approaches
that produce guaranteed bounds on the robust loss. The end result is that by
executing a few more forward and backward passes through a slightly modified
version of the original network (though possibly with much larger batch sizes),
we can learn a classifier that is provably robust to any norm-bounded
adversarial attack. We illustrate the approach on a number of tasks to train
classifiers with robust adversarial guarantees (e.g. for MNIST, we produce a
convolutional classifier that provably has less than 5.8% test error for any
adversarial attack with bounded norm less than ),
and code for all experiments in the paper is available at
https://github.com/locuslab/convex_adversarial.Comment: ICML final versio