3,059 research outputs found

    Defense against Universal Adversarial Perturbations

    Full text link
    Recent advances in Deep Learning show the existence of image-agnostic quasi-imperceptible perturbations that when applied to `any' image can fool a state-of-the-art network classifier to change its prediction about the image label. These `Universal Adversarial Perturbations' pose a serious threat to the success of Deep Learning in practice. We present the first dedicated framework to effectively defend the networks against such perturbations. Our approach learns a Perturbation Rectifying Network (PRN) as `pre-input' layers to a targeted model, such that the targeted model needs no modification. The PRN is learned from real and synthetic image-agnostic perturbations, where an efficient method to compute the latter is also proposed. A perturbation detector is separately trained on the Discrete Cosine Transform of the input-output difference of the PRN. A query image is first passed through the PRN and verified by the detector. If a perturbation is detected, the output of the PRN is used for label prediction instead of the actual image. A rigorous evaluation shows that our framework can defend the network classifiers against unseen adversarial perturbations in the real-world scenarios with up to 97.5% success rate. The PRN also generalizes well in the sense that training for one targeted network defends another network with a comparable success rate.Comment: Accepted in IEEE CVPR 201

    Disentangling Adversarial Robustness and Generalization

    Full text link
    Obtaining deep networks that are robust against adversarial examples and generalize well is an open problem. A recent hypothesis even states that both robust and accurate models are impossible, i.e., adversarial robustness and generalization are conflicting goals. In an effort to clarify the relationship between robustness and generalization, we assume an underlying, low-dimensional data manifold and show that: 1. regular adversarial examples leave the manifold; 2. adversarial examples constrained to the manifold, i.e., on-manifold adversarial examples, exist; 3. on-manifold adversarial examples are generalization errors, and on-manifold adversarial training boosts generalization; 4. regular robustness and generalization are not necessarily contradicting goals. These assumptions imply that both robust and accurate models are possible. However, different models (architectures, training strategies etc.) can exhibit different robustness and generalization characteristics. To confirm our claims, we present extensive experiments on synthetic data (with known manifold) as well as on EMNIST, Fashion-MNIST and CelebA.Comment: Conference on Computer Vision and Pattern Recognition 201

    Procedural Noise Adversarial Examples for Black-Box Attacks on Deep Convolutional Networks

    Full text link
    Deep Convolutional Networks (DCNs) have been shown to be vulnerable to adversarial examples---perturbed inputs specifically designed to produce intentional errors in the learning algorithms at test time. Existing input-agnostic adversarial perturbations exhibit interesting visual patterns that are currently unexplained. In this paper, we introduce a structured approach for generating Universal Adversarial Perturbations (UAPs) with procedural noise functions. Our approach unveils the systemic vulnerability of popular DCN models like Inception v3 and YOLO v3, with single noise patterns able to fool a model on up to 90% of the dataset. Procedural noise allows us to generate a distribution of UAPs with high universal evasion rates using only a few parameters. Additionally, we propose Bayesian optimization to efficiently learn procedural noise parameters to construct inexpensive untargeted black-box attacks. We demonstrate that it can achieve an average of less than 10 queries per successful attack, a 100-fold improvement on existing methods. We further motivate the use of input-agnostic defences to increase the stability of models to adversarial perturbations. The universality of our attacks suggests that DCN models may be sensitive to aggregations of low-level class-agnostic features. These findings give insight on the nature of some universal adversarial perturbations and how they could be generated in other applications.Comment: 16 pages, 10 figures. In Proceedings of the 2019 ACM SIGSAC Conference on Computer and Communications Security (CCS '19
    • …
    corecore