26,076 research outputs found

    Towards Understanding the Transferability of Deep Representations

    Full text link
    Deep neural networks trained on a wide range of datasets demonstrate impressive transferability. Deep features appear general in that they are applicable to many datasets and tasks. Such property is in prevalent use in real-world applications. A neural network pretrained on large datasets, such as ImageNet, can significantly boost generalization and accelerate training if fine-tuned to a smaller target dataset. Despite its pervasiveness, few effort has been devoted to uncovering the reason of transferability in deep feature representations. This paper tries to understand transferability from the perspectives of improved generalization, optimization and the feasibility of transferability. We demonstrate that 1) Transferred models tend to find flatter minima, since their weight matrices stay close to the original flat region of pretrained parameters when transferred to a similar target dataset; 2) Transferred representations make the loss landscape more favorable with improved Lipschitzness, which accelerates and stabilizes training substantially. The improvement largely attributes to the fact that the principal component of gradient is suppressed in the pretrained parameters, thus stabilizing the magnitude of gradient in back-propagation. 3) The feasibility of transferability is related to the similarity of both input and label. And a surprising discovery is that the feasibility is also impacted by the training stages in that the transferability first increases during training, and then declines. We further provide a theoretical analysis to verify our observations

    Interpretable Deep Learning under Fire

    Full text link
    Providing explanations for deep neural network (DNN) models is crucial for their use in security-sensitive domains. A plethora of interpretation models have been proposed to help users understand the inner workings of DNNs: how does a DNN arrive at a specific decision for a given input? The improved interpretability is believed to offer a sense of security by involving human in the decision-making process. Yet, due to its data-driven nature, the interpretability itself is potentially susceptible to malicious manipulations, about which little is known thus far. Here we bridge this gap by conducting the first systematic study on the security of interpretable deep learning systems (IDLSes). We show that existing \imlses are highly vulnerable to adversarial manipulations. Specifically, we present ADV^2, a new class of attacks that generate adversarial inputs not only misleading target DNNs but also deceiving their coupled interpretation models. Through empirical evaluation against four major types of IDLSes on benchmark datasets and in security-critical applications (e.g., skin cancer diagnosis), we demonstrate that with ADV^2 the adversary is able to arbitrarily designate an input's prediction and interpretation. Further, with both analytical and empirical evidence, we identify the prediction-interpretation gap as one root cause of this vulnerability -- a DNN and its interpretation model are often misaligned, resulting in the possibility of exploiting both models simultaneously. Finally, we explore potential countermeasures against ADV^2, including leveraging its low transferability and incorporating it in an adversarial training framework. Our findings shed light on designing and operating IDLSes in a more secure and informative fashion, leading to several promising research directions.Comment: USENIX Security Symposium '2

    Towards a Robust Deep Neural Network in Texts: A Survey

    Full text link
    Deep neural networks (DNNs) have achieved remarkable success in various tasks (e.g., image classification, speech recognition, and natural language processing). However, researches have shown that DNN models are vulnerable to adversarial examples, which cause incorrect predictions by adding imperceptible perturbations into normal inputs. Studies on adversarial examples in image domain have been well investigated, but in texts the research is not enough, let alone a comprehensive survey in this field. In this paper, we aim at presenting a comprehensive understanding of adversarial attacks and corresponding mitigation strategies in texts. Specifically, we first give a taxonomy of adversarial attacks and defenses in texts from the perspective of different natural language processing (NLP) tasks, and then introduce how to build a robust DNN model via testing and verification. Finally, we discuss the existing challenges of adversarial attacks and defenses in texts and present the future research directions in this emerging field

    Query-Free Attacks on Industry-Grade Face Recognition Systems under Resource Constraints

    Full text link
    To launch black-box attacks against a Deep Neural Network (DNN) based Face Recognition (FR) system, one needs to build \textit{substitute} models to simulate the target model, so the adversarial examples discovered from substitute models could also mislead the target model. Such \textit{transferability} is achieved in recent studies through querying the target model to obtain data for training the substitute models. A real-world target, likes the FR system of law enforcement, however, is less accessible to the adversary. To attack such a system, a substitute model with similar quality as the target model is needed to identify their common defects. This is hard since the adversary often does not have the enough resources to train such a powerful model (hundreds of millions of images and rooms of GPUs are needed to train a commercial FR system). We found in our research, however, that a resource-constrained adversary could still effectively approximate the target model's capability to recognize \textit{specific} individuals, by training \textit{biased} substitute models on additional images of those victims whose identities the attacker want to cover or impersonate. This is made possible by a new property we discovered, called \textit{Nearly Local Linearity} (NLL), which models the observation that an ideal DNN model produces the image representations (embeddings) whose distances among themselves truthfully describe the human perception of the differences among the input images. By simulating this property around the victim's images, we significantly improve the transferability of black-box impersonation attacks by nearly 50\%. Particularly, we successfully attacked a commercial system trained over 20 million images, using 4 million images and 1/5 of the training time but achieving 62\% transferability in an impersonation attack and 89\% in a dodging attack

    Factors of Transferability for a Generic ConvNet Representation

    Full text link
    Evidence is mounting that Convolutional Networks (ConvNets) are the most effective representation learning method for visual recognition tasks. In the common scenario, a ConvNet is trained on a large labeled dataset (source) and the feed-forward units activation of the trained network, at a certain layer of the network, is used as a generic representation of an input image for a task with relatively smaller training set (target). Recent studies have shown this form of representation transfer to be suitable for a wide range of target visual recognition tasks. This paper introduces and investigates several factors affecting the transferability of such representations. It includes parameters for training of the source ConvNet such as its architecture, distribution of the training data, etc. and also the parameters of feature extraction such as layer of the trained ConvNet, dimensionality reduction, etc. Then, by optimizing these factors, we show that significant improvements can be achieved on various (17) visual recognition tasks. We further show that these visual recognition tasks can be categorically ordered based on their distance from the source task such that a correlation between the performance of tasks and their distance from the source task w.r.t. the proposed factors is observed.Comment: Extended version of the workshop paper with more experiments and updated text and title. Original CVPR15 DeepVision workshop paper title: "From Generic to Specific Deep Representations for Visual Recognition

    Linguistic Knowledge and Transferability of Contextual Representations

    Full text link
    Contextual word representations derived from large-scale neural language models are successful across a diverse set of NLP tasks, suggesting that they encode useful and transferable features of language. To shed light on the linguistic knowledge they capture, we study the representations produced by several recent pretrained contextualizers (variants of ELMo, the OpenAI transformer language model, and BERT) with a suite of seventeen diverse probing tasks. We find that linear models trained on top of frozen contextual representations are competitive with state-of-the-art task-specific models in many cases, but fail on tasks requiring fine-grained linguistic knowledge (e.g., conjunct identification). To investigate the transferability of contextual word representations, we quantify differences in the transferability of individual layers within contextualizers, especially between recurrent neural networks (RNNs) and transformers. For instance, higher layers of RNNs are more task-specific, while transformer layers do not exhibit the same monotonic trend. In addition, to better understand what makes contextual word representations transferable, we compare language model pretraining with eleven supervised pretraining tasks. For any given task, pretraining on a closely related task yields better performance than language model pretraining (which is better on average) when the pretraining dataset is fixed. However, language model pretraining on more data gives the best results.Comment: 22 pages, 4 figures; to appear at NAACL 201

    Characterizing Adversarial Examples Based on Spatial Consistency Information for Semantic Segmentation

    Full text link
    Deep Neural Networks (DNNs) have been widely applied in various recognition tasks. However, recently DNNs have been shown to be vulnerable against adversarial examples, which can mislead DNNs to make arbitrary incorrect predictions. While adversarial examples are well studied in classification tasks, other learning problems may have different properties. For instance, semantic segmentation requires additional components such as dilated convolutions and multiscale processing. In this paper, we aim to characterize adversarial examples based on spatial context information in semantic segmentation. We observe that spatial consistency information can be potentially leveraged to detect adversarial examples robustly even when a strong adaptive attacker has access to the model and detection strategies. We also show that adversarial examples based on attacks considered within the paper barely transfer among models, even though transferability is common in classification. Our observations shed new light on developing adversarial attacks and defenses to better understand the vulnerabilities of DNNs.Comment: Accepted to ECCV 201

    Sitatapatra: Blocking the Transfer of Adversarial Samples

    Full text link
    Convolutional Neural Networks (CNNs) are widely used to solve classification tasks in computer vision. However, they can be tricked into misclassifying specially crafted `adversarial' samples -- and samples built to trick one model often work alarmingly well against other models trained on the same task. In this paper we introduce Sitatapatra, a system designed to block the transfer of adversarial samples. It diversifies neural networks using a key, as in cryptography, and provides a mechanism for detecting attacks. What's more, when adversarial samples are detected they can typically be traced back to the individual device that was used to develop them. The run-time overheads are minimal permitting the use of Sitatapatra on constrained systems

    Adversarial Attacks against Deep Saliency Models

    Full text link
    Currently, a plethora of saliency models based on deep neural networks have led great breakthroughs in many complex high-level vision tasks (e.g. scene description, object detection). The robustness of these models, however, has not yet been studied. In this paper, we propose a sparse feature-space adversarial attack method against deep saliency models for the first time. The proposed attack only requires a part of the model information, and is able to generate a sparser and more insidious adversarial perturbation, compared to traditional image-space attacks. These adversarial perturbations are so subtle that a human observer cannot notice their presences, but the model outputs will be revolutionized. This phenomenon raises security threats to deep saliency models in practical applications. We also explore some intriguing properties of the feature-space attack, e.g. 1) the hidden layers with bigger receptive fields generate sparser perturbations, 2) the deeper hidden layers achieve higher attack success rates, and 3) different loss functions and different attacked layers will result in diverse perturbations. Experiments indicate that the proposed method is able to successfully attack different model architectures across various image scenes

    Adversarial Examples: Attacks and Defenses for Deep Learning

    Full text link
    With rapid progress and significant successes in a wide spectrum of applications, deep learning is being applied in many safety-critical environments. However, deep neural networks have been recently found vulnerable to well-designed input samples, called adversarial examples. Adversarial examples are imperceptible to human but can easily fool deep neural networks in the testing/deploying stage. The vulnerability to adversarial examples becomes one of the major risks for applying deep neural networks in safety-critical environments. Therefore, attacks and defenses on adversarial examples draw great attention. In this paper, we review recent findings on adversarial examples for deep neural networks, summarize the methods for generating adversarial examples, and propose a taxonomy of these methods. Under the taxonomy, applications for adversarial examples are investigated. We further elaborate on countermeasures for adversarial examples and explore the challenges and the potential solutions.Comment: Github: https://github.com/chbrian/awesome-adversarial-examples-d
    corecore