Search CORE

32 research outputs found

Global Robustness Evaluation of Deep Neural Networks with Provable Guarantees for the $L_0$ Norm

Author: Huang Xiaowei
Kroening Daniel
Kwiatkowska Marta
Ruan Wenjie
Sun Youcheng
Wu Min
Publication venue
Publication date: 20/11/2018
Field of study

Deployment of deep neural networks (DNNs) in safety- or security-critical systems requires provable guarantees on their correct behaviour. A common requirement is robustness to adversarial perturbations in a neighbourhood around an input. In this paper we focus on the

L_0

norm and aim to compute, for a trained DNN and an input, the maximal radius of a safe norm ball around the input within which there are no adversarial examples. Then we define global robustness as an expectation of the maximal safe radius over a test data set. We first show that the problem is NP-hard, and then propose an approximate approach to iteratively compute lower and upper bounds on the network's robustness. The approach is \emph{anytime}, i.e., it returns intermediate bounds and robustness estimates that are gradually, but strictly, improved as the computation proceeds; \emph{tensor-based}, i.e., the computation is conducted over a set of inputs simultaneously, instead of one by one, to enable efficient GPU computation; and has \emph{provable guarantees}, i.e., both the bounds and the robustness estimates can converge to their optimal values. Finally, we demonstrate the utility of the proposed approach in practice to compute tight bounds by applying and adapting the anytime algorithm to a set of challenging problems, including global robustness evaluation, competitive

L_0

attacks, test case generation for DNNs, and local robustness evaluation on large-scale ImageNet DNNs. We release the code of all case studies via GitHub.Comment: 42 Pages, Github: https://github.com/TrustAI/L0-TR

arXiv.org e-Print Archive

Formal methods and software engineering for DL. Security, safety and productivity for DL systems development

Author: Hains Gaetan J. D. R.
Jakobsson Arvid
Khmelevsky Youry
Publication venue
Publication date: 31/01/2019
Field of study

Deep Learning (DL) techniques are now widespread and being integrated into many important systems. Their classification and recognition abilities ensure their relevance for multiple application domains. As machine-learning that relies on training instead of algorithm programming, they offer a high degree of productivity. But they can be vulnerable to attacks and the verification of their correctness is only just emerging as a scientific and engineering possibility. This paper is a major update of a previously-published survey, attempting to cover all recent publications in this area. It also covers an even more recent trend, namely the design of domain-specific languages for producing and training neural nets.Comment: Submitted to IEEE-CCECE201

arXiv.org e-Print Archive

Learning to Defense by Learning to Attack

Author: Chen Zhehui
Dai Bo
Jiang Haoming
Shi Yuyang
Zhao Tuo
Publication venue
Publication date: 10/03/2020
Field of study

Adversarial training is a principled approach for training robust neural networks. From an optimization perspective, adversarial training is solving a bilevel optimization problem (a general form of minimax approaches): The leader problem targets on learning a robust classifier; The follower problem tries to generate adversarial samples. Unfortunately, such a bilevel problem is very challenging to solve due to its highly complicated structure. This work proposes a new adversarial training method based on a generic learning-to-learn (L2L) framework. Specifically, instead of applying hand-designed algorithms for the follower problem, we learn an optimizer, which is parametrized by a convolutional neural network. Meanwhile, a robust classifier is learned to defense the adversarial attacks generated by the learned optimizer. Our experiments over CIFAR datasets demonstrate that L2L improves upon existing methods in both robust accuracy and computational efficiency. Moreover, the L2L framework can be extended to other popular bilevel problems in machine learning

arXiv.org e-Print Archive

Semidefinite relaxations for certifying robustness to adversarial examples

Author: Liang Percy
Raghunathan Aditi
Steinhardt Jacob
Publication venue
Publication date: 02/11/2018
Field of study

Despite their impressive performance on diverse tasks, neural networks fail catastrophically in the presence of adversarial inputs---imperceptibly but adversarially perturbed versions of natural inputs. We have witnessed an arms race between defenders who attempt to train robust networks and attackers who try to construct adversarial examples. One promise of ending the arms race is developing certified defenses, ones which are provably robust against all attackers in some family. These certified defenses are based on convex relaxations which construct an upper bound on the worst case loss over all attackers in the family. Previous relaxations are loose on networks that are not trained against the respective relaxation. In this paper, we propose a new semidefinite relaxation for certifying robustness that applies to arbitrary ReLU networks. We show that our proposed relaxation is tighter than previous relaxations and produces meaningful robustness guarantees on three different "foreign networks" whose training objectives are agnostic to our proposed relaxation.Comment: To appear at NIPS 201

arXiv.org e-Print Archive

DeepSafe: A Data-driven Approach for Checking Adversarial Robustness in Neural Networks

Author: Barrett Clark
Gopinath Divya
Katz Guy
Pasareanu Corina S.
Publication venue
Publication date: 30/01/2020
Field of study

Deep neural networks have become widely used, obtaining remarkable results in domains such as computer vision, speech recognition, natural language processing, audio recognition, social network filtering, machine translation, and bio-informatics, where they have produced results comparable to human experts. However, these networks can be easily fooled by adversarial perturbations: minimal changes to correctly-classified inputs, that cause the network to mis-classify them. This phenomenon represents a concern for both safety and security, but it is currently unclear how to measure a network's robustness against such perturbations. Existing techniques are limited to checking robustness around a few individual input points, providing only very limited guarantees. We propose a novel approach for automatically identifying safe regions of the input space, within which the network is robust against adversarial perturbations. The approach is data-guided, relying on clustering to identify well-defined geometric regions as candidate safe regions. We then utilize verification techniques to confirm that these regions are safe or to provide counter-examples showing that they are not safe. We also introduce the notion of targeted robustness which, for a given target label and region, ensures that a NN does not map any input in the region to the target label. We evaluated our technique on the MNIST dataset and on a neural network implementation of a controller for the next-generation Airborne Collision Avoidance System for unmanned aircraft (ACAS Xu). For these networks, our approach identified multiple regions which were completely safe as well as some which were only safe for specific labels. It also discovered several adversarial perturbations of interest

arXiv.org e-Print Archive

Benchmarking Neural Network Robustness to Common Corruptions and Perturbations

Author: Dietterich Thomas
Hendrycks Dan
Publication venue
Publication date: 28/03/2019
Field of study

In this paper we establish rigorous benchmarks for image classifier robustness. Our first benchmark, ImageNet-C, standardizes and expands the corruption robustness topic, while showing which classifiers are preferable in safety-critical applications. Then we propose a new dataset called ImageNet-P which enables researchers to benchmark a classifier's robustness to common perturbations. Unlike recent robustness research, this benchmark evaluates performance on common corruptions and perturbations not worst-case adversarial perturbations. We find that there are negligible changes in relative corruption robustness from AlexNet classifiers to ResNet classifiers. Afterward we discover ways to enhance corruption and perturbation robustness. We even find that a bypassed adversarial defense provides substantial common perturbation robustness. Together our benchmarks may aid future work toward networks that robustly generalize.Comment: ICLR 2019 camera-ready; datasets available at https://github.com/hendrycks/robustness ; this article supersedes arXiv:1807.0169

arXiv.org e-Print Archive

Identify Susceptible Locations in Medical Records via Adversarial Attacks on Deep Predictive Models

Author: Sun Mengying
Tang Fengyi
Wang Fei
Yi Jinfeng
Zhou Jiayu
Publication venue
Publication date: 13/02/2018
Field of study

The surging availability of electronic medical records (EHR) leads to increased research interests in medical predictive modeling. Recently many deep learning based predicted models are also developed for EHR data and demonstrated impressive performance. However, a series of recent studies showed that these deep models are not safe: they suffer from certain vulnerabilities. In short, a well-trained deep network can be extremely sensitive to inputs with negligible changes. These inputs are referred to as adversarial examples. In the context of medical informatics, such attacks could alter the result of a high performance deep predictive model by slightly perturbing a patient's medical records. Such instability not only reflects the weakness of deep architectures, more importantly, it offers guide on detecting susceptible parts on the inputs. In this paper, we propose an efficient and effective framework that learns a time-preferential minimum attack targeting the LSTM model with EHR inputs, and we leverage this attack strategy to screen medical records of patients and identify susceptible events and measurements. The efficient screening procedure can assist decision makers to pay extra attentions to the locations that can cause severe consequence if not measured correctly. We conduct extensive empirical studies on a real-world urgent care cohort and demonstrate the effectiveness of the proposed screening approach

arXiv.org e-Print Archive

Training for Faster Adversarial Robustness Verification via Inducing ReLU Stability

Author: Madry Aleksander
Shafiullah Nur Muhammad
Tjeng Vincent
Xiao Kai Y.
Publication venue
Publication date: 23/04/2019
Field of study

We explore the concept of co-design in the context of neural network verification. Specifically, we aim to train deep neural networks that not only are robust to adversarial perturbations but also whose robustness can be verified more easily. To this end, we identify two properties of network models - weight sparsity and so-called ReLU stability - that turn out to significantly impact the complexity of the corresponding verification task. We demonstrate that improving weight sparsity alone already enables us to turn computationally intractable verification problems into tractable ones. Then, improving ReLU stability leads to an additional 4-13x speedup in verification times. An important feature of our methodology is its "universality," in the sense that it can be used with a broad range of training procedures and verification approaches

arXiv.org e-Print Archive

Evaluating Robustness of Neural Networks with Mixed Integer Programming

Author: Tedrake Russ
Tjeng Vincent
Xiao Kai
Publication venue
Publication date: 17/02/2019
Field of study

Neural networks have demonstrated considerable success on a wide variety of real-world problems. However, networks trained only to optimize for training accuracy can often be fooled by adversarial examples - slightly perturbed inputs that are misclassified with high confidence. Verification of networks enables us to gauge their vulnerability to such adversarial examples. We formulate verification of piecewise-linear neural networks as a mixed integer program. On a representative task of finding minimum adversarial distortions, our verifier is two to three orders of magnitude quicker than the state-of-the-art. We achieve this computational speedup via tight formulations for non-linearities, as well as a novel presolve algorithm that makes full use of all information available. The computational speedup allows us to verify properties on convolutional networks with an order of magnitude more ReLUs than networks previously verified by any complete verifier. In particular, we determine for the first time the exact adversarial accuracy of an MNIST classifier to perturbations with bounded

l_\infty

norm

\epsilon=0.1

: for this classifier, we find an adversarial example for 4.38% of samples, and a certificate of robustness (to perturbations with bounded norm) for the remainder. Across all robust training procedures and network architectures considered, we are able to certify more samples than the state-of-the-art and find more adversarial examples than a strong first-order attack.Comment: Accepted as a conference paper at ICLR 201

arXiv.org e-Print Archive

Provable defenses against adversarial examples via the convex outer adversarial polytope

Author: Kolter J. Zico
Wong Eric
Publication venue
Publication date: 08/06/2018
Field of study

We propose a method to learn deep ReLU-based classifiers that are provably robust against norm-bounded adversarial perturbations on the training data. For previously unseen examples, the approach is guaranteed to detect all adversarial examples, though it may flag some non-adversarial examples as well. The basic idea is to consider a convex outer approximation of the set of activations reachable through a norm-bounded perturbation, and we develop a robust optimization procedure that minimizes the worst case loss over this outer region (via a linear program). Crucially, we show that the dual problem to this linear program can be represented itself as a deep network similar to the backpropagation network, leading to very efficient optimization approaches that produce guaranteed bounds on the robust loss. The end result is that by executing a few more forward and backward passes through a slightly modified version of the original network (though possibly with much larger batch sizes), we can learn a classifier that is provably robust to any norm-bounded adversarial attack. We illustrate the approach on a number of tasks to train classifiers with robust adversarial guarantees (e.g. for MNIST, we produce a convolutional classifier that provably has less than 5.8% test error for any adversarial attack with bounded

\ell_\infty

norm less than

\epsilon = 0.1

), and code for all experiments in the paper is available at https://github.com/locuslab/convex_adversarial.Comment: ICML final versio

arXiv.org e-Print Archive