32 research outputs found

    Global Robustness Evaluation of Deep Neural Networks with Provable Guarantees for the L0L_0 Norm

    Full text link
    Deployment of deep neural networks (DNNs) in safety- or security-critical systems requires provable guarantees on their correct behaviour. A common requirement is robustness to adversarial perturbations in a neighbourhood around an input. In this paper we focus on the L0L_0 norm and aim to compute, for a trained DNN and an input, the maximal radius of a safe norm ball around the input within which there are no adversarial examples. Then we define global robustness as an expectation of the maximal safe radius over a test data set. We first show that the problem is NP-hard, and then propose an approximate approach to iteratively compute lower and upper bounds on the network's robustness. The approach is \emph{anytime}, i.e., it returns intermediate bounds and robustness estimates that are gradually, but strictly, improved as the computation proceeds; \emph{tensor-based}, i.e., the computation is conducted over a set of inputs simultaneously, instead of one by one, to enable efficient GPU computation; and has \emph{provable guarantees}, i.e., both the bounds and the robustness estimates can converge to their optimal values. Finally, we demonstrate the utility of the proposed approach in practice to compute tight bounds by applying and adapting the anytime algorithm to a set of challenging problems, including global robustness evaluation, competitive L0L_0 attacks, test case generation for DNNs, and local robustness evaluation on large-scale ImageNet DNNs. We release the code of all case studies via GitHub.Comment: 42 Pages, Github: https://github.com/TrustAI/L0-TR

    Formal methods and software engineering for DL. Security, safety and productivity for DL systems development

    Full text link
    Deep Learning (DL) techniques are now widespread and being integrated into many important systems. Their classification and recognition abilities ensure their relevance for multiple application domains. As machine-learning that relies on training instead of algorithm programming, they offer a high degree of productivity. But they can be vulnerable to attacks and the verification of their correctness is only just emerging as a scientific and engineering possibility. This paper is a major update of a previously-published survey, attempting to cover all recent publications in this area. It also covers an even more recent trend, namely the design of domain-specific languages for producing and training neural nets.Comment: Submitted to IEEE-CCECE201

    Learning to Defense by Learning to Attack

    Full text link
    Adversarial training is a principled approach for training robust neural networks. From an optimization perspective, adversarial training is solving a bilevel optimization problem (a general form of minimax approaches): The leader problem targets on learning a robust classifier; The follower problem tries to generate adversarial samples. Unfortunately, such a bilevel problem is very challenging to solve due to its highly complicated structure. This work proposes a new adversarial training method based on a generic learning-to-learn (L2L) framework. Specifically, instead of applying hand-designed algorithms for the follower problem, we learn an optimizer, which is parametrized by a convolutional neural network. Meanwhile, a robust classifier is learned to defense the adversarial attacks generated by the learned optimizer. Our experiments over CIFAR datasets demonstrate that L2L improves upon existing methods in both robust accuracy and computational efficiency. Moreover, the L2L framework can be extended to other popular bilevel problems in machine learning

    Semidefinite relaxations for certifying robustness to adversarial examples

    Full text link
    Despite their impressive performance on diverse tasks, neural networks fail catastrophically in the presence of adversarial inputs---imperceptibly but adversarially perturbed versions of natural inputs. We have witnessed an arms race between defenders who attempt to train robust networks and attackers who try to construct adversarial examples. One promise of ending the arms race is developing certified defenses, ones which are provably robust against all attackers in some family. These certified defenses are based on convex relaxations which construct an upper bound on the worst case loss over all attackers in the family. Previous relaxations are loose on networks that are not trained against the respective relaxation. In this paper, we propose a new semidefinite relaxation for certifying robustness that applies to arbitrary ReLU networks. We show that our proposed relaxation is tighter than previous relaxations and produces meaningful robustness guarantees on three different "foreign networks" whose training objectives are agnostic to our proposed relaxation.Comment: To appear at NIPS 201

    DeepSafe: A Data-driven Approach for Checking Adversarial Robustness in Neural Networks

    Full text link
    Deep neural networks have become widely used, obtaining remarkable results in domains such as computer vision, speech recognition, natural language processing, audio recognition, social network filtering, machine translation, and bio-informatics, where they have produced results comparable to human experts. However, these networks can be easily fooled by adversarial perturbations: minimal changes to correctly-classified inputs, that cause the network to mis-classify them. This phenomenon represents a concern for both safety and security, but it is currently unclear how to measure a network's robustness against such perturbations. Existing techniques are limited to checking robustness around a few individual input points, providing only very limited guarantees. We propose a novel approach for automatically identifying safe regions of the input space, within which the network is robust against adversarial perturbations. The approach is data-guided, relying on clustering to identify well-defined geometric regions as candidate safe regions. We then utilize verification techniques to confirm that these regions are safe or to provide counter-examples showing that they are not safe. We also introduce the notion of targeted robustness which, for a given target label and region, ensures that a NN does not map any input in the region to the target label. We evaluated our technique on the MNIST dataset and on a neural network implementation of a controller for the next-generation Airborne Collision Avoidance System for unmanned aircraft (ACAS Xu). For these networks, our approach identified multiple regions which were completely safe as well as some which were only safe for specific labels. It also discovered several adversarial perturbations of interest

    Benchmarking Neural Network Robustness to Common Corruptions and Perturbations

    Full text link
    In this paper we establish rigorous benchmarks for image classifier robustness. Our first benchmark, ImageNet-C, standardizes and expands the corruption robustness topic, while showing which classifiers are preferable in safety-critical applications. Then we propose a new dataset called ImageNet-P which enables researchers to benchmark a classifier's robustness to common perturbations. Unlike recent robustness research, this benchmark evaluates performance on common corruptions and perturbations not worst-case adversarial perturbations. We find that there are negligible changes in relative corruption robustness from AlexNet classifiers to ResNet classifiers. Afterward we discover ways to enhance corruption and perturbation robustness. We even find that a bypassed adversarial defense provides substantial common perturbation robustness. Together our benchmarks may aid future work toward networks that robustly generalize.Comment: ICLR 2019 camera-ready; datasets available at https://github.com/hendrycks/robustness ; this article supersedes arXiv:1807.0169

    Identify Susceptible Locations in Medical Records via Adversarial Attacks on Deep Predictive Models

    Full text link
    The surging availability of electronic medical records (EHR) leads to increased research interests in medical predictive modeling. Recently many deep learning based predicted models are also developed for EHR data and demonstrated impressive performance. However, a series of recent studies showed that these deep models are not safe: they suffer from certain vulnerabilities. In short, a well-trained deep network can be extremely sensitive to inputs with negligible changes. These inputs are referred to as adversarial examples. In the context of medical informatics, such attacks could alter the result of a high performance deep predictive model by slightly perturbing a patient's medical records. Such instability not only reflects the weakness of deep architectures, more importantly, it offers guide on detecting susceptible parts on the inputs. In this paper, we propose an efficient and effective framework that learns a time-preferential minimum attack targeting the LSTM model with EHR inputs, and we leverage this attack strategy to screen medical records of patients and identify susceptible events and measurements. The efficient screening procedure can assist decision makers to pay extra attentions to the locations that can cause severe consequence if not measured correctly. We conduct extensive empirical studies on a real-world urgent care cohort and demonstrate the effectiveness of the proposed screening approach

    Training for Faster Adversarial Robustness Verification via Inducing ReLU Stability

    Full text link
    We explore the concept of co-design in the context of neural network verification. Specifically, we aim to train deep neural networks that not only are robust to adversarial perturbations but also whose robustness can be verified more easily. To this end, we identify two properties of network models - weight sparsity and so-called ReLU stability - that turn out to significantly impact the complexity of the corresponding verification task. We demonstrate that improving weight sparsity alone already enables us to turn computationally intractable verification problems into tractable ones. Then, improving ReLU stability leads to an additional 4-13x speedup in verification times. An important feature of our methodology is its "universality," in the sense that it can be used with a broad range of training procedures and verification approaches

    Evaluating Robustness of Neural Networks with Mixed Integer Programming

    Full text link
    Neural networks have demonstrated considerable success on a wide variety of real-world problems. However, networks trained only to optimize for training accuracy can often be fooled by adversarial examples - slightly perturbed inputs that are misclassified with high confidence. Verification of networks enables us to gauge their vulnerability to such adversarial examples. We formulate verification of piecewise-linear neural networks as a mixed integer program. On a representative task of finding minimum adversarial distortions, our verifier is two to three orders of magnitude quicker than the state-of-the-art. We achieve this computational speedup via tight formulations for non-linearities, as well as a novel presolve algorithm that makes full use of all information available. The computational speedup allows us to verify properties on convolutional networks with an order of magnitude more ReLUs than networks previously verified by any complete verifier. In particular, we determine for the first time the exact adversarial accuracy of an MNIST classifier to perturbations with bounded l∞l_\infty norm ϵ=0.1\epsilon=0.1: for this classifier, we find an adversarial example for 4.38% of samples, and a certificate of robustness (to perturbations with bounded norm) for the remainder. Across all robust training procedures and network architectures considered, we are able to certify more samples than the state-of-the-art and find more adversarial examples than a strong first-order attack.Comment: Accepted as a conference paper at ICLR 201

    Provable defenses against adversarial examples via the convex outer adversarial polytope

    Full text link
    We propose a method to learn deep ReLU-based classifiers that are provably robust against norm-bounded adversarial perturbations on the training data. For previously unseen examples, the approach is guaranteed to detect all adversarial examples, though it may flag some non-adversarial examples as well. The basic idea is to consider a convex outer approximation of the set of activations reachable through a norm-bounded perturbation, and we develop a robust optimization procedure that minimizes the worst case loss over this outer region (via a linear program). Crucially, we show that the dual problem to this linear program can be represented itself as a deep network similar to the backpropagation network, leading to very efficient optimization approaches that produce guaranteed bounds on the robust loss. The end result is that by executing a few more forward and backward passes through a slightly modified version of the original network (though possibly with much larger batch sizes), we can learn a classifier that is provably robust to any norm-bounded adversarial attack. We illustrate the approach on a number of tasks to train classifiers with robust adversarial guarantees (e.g. for MNIST, we produce a convolutional classifier that provably has less than 5.8% test error for any adversarial attack with bounded ℓ∞\ell_\infty norm less than ϵ=0.1\epsilon = 0.1), and code for all experiments in the paper is available at https://github.com/locuslab/convex_adversarial.Comment: ICML final versio
    corecore