1,612 research outputs found

    Towards Leveraging the Information of Gradients in Optimization-based Adversarial Attack

    Full text link
    In recent years, deep neural networks demonstrated state-of-the-art performance in a large variety of tasks and therefore have been adopted in many applications. On the other hand, the latest studies revealed that neural networks are vulnerable to adversarial examples obtained by carefully adding small perturbation to legitimate samples. Based upon the observation, many attack methods were proposed. Among them, the optimization-based CW attack is the most powerful as the produced adversarial samples present much less distortion compared to other methods. The better attacking effect, however, comes at the cost of running more iterations and thus longer computation time to reach desirable results. In this work, we propose to leverage the information of gradients as a guidance during the search of adversaries. More specifically, directly incorporating the gradients into the perturbation can be regarded as a constraint added to the optimization process. We intuitively and empirically prove the rationality of our method in reducing the search space. Our experiments show that compared to the original CW attack, the proposed method requires fewer iterations towards adversarial samples, obtaining a higher success rate and resulting in smaller â„“2\ell_2 distortion

    Adversarial Examples: Attacks and Defenses for Deep Learning

    Full text link
    With rapid progress and significant successes in a wide spectrum of applications, deep learning is being applied in many safety-critical environments. However, deep neural networks have been recently found vulnerable to well-designed input samples, called adversarial examples. Adversarial examples are imperceptible to human but can easily fool deep neural networks in the testing/deploying stage. The vulnerability to adversarial examples becomes one of the major risks for applying deep neural networks in safety-critical environments. Therefore, attacks and defenses on adversarial examples draw great attention. In this paper, we review recent findings on adversarial examples for deep neural networks, summarize the methods for generating adversarial examples, and propose a taxonomy of these methods. Under the taxonomy, applications for adversarial examples are investigated. We further elaborate on countermeasures for adversarial examples and explore the challenges and the potential solutions.Comment: Github: https://github.com/chbrian/awesome-adversarial-examples-d

    TSViz: Demystification of Deep Learning Models for Time-Series Analysis

    Full text link
    This paper presents a novel framework for demystification of convolutional deep learning models for time-series analysis. This is a step towards making informed/explainable decisions in the domain of time-series, powered by deep learning. There have been numerous efforts to increase the interpretability of image-centric deep neural network models, where the learned features are more intuitive to visualize. Visualization in time-series domain is much more complicated as there is no direct interpretation of the filters and inputs as compared to the image modality. In addition, little or no concentration has been devoted for the development of such tools in the domain of time-series in the past. TSViz provides possibilities to explore and analyze a network from different dimensions at different levels of abstraction which includes identification of parts of the input that were responsible for a prediction (including per filter saliency), importance of different filters present in the network for a particular prediction, notion of diversity present in the network through filter clustering, understanding of the main sources of variation learnt by the network through inverse optimization, and analysis of the network's robustness against adversarial noise. As a sanity check for the computed influence values, we demonstrate results regarding pruning of neural networks based on the computed influence information. These representations allow to understand the network features so that the acceptability of deep networks for time-series data can be enhanced. This is extremely important in domains like finance, industry 4.0, self-driving cars, health-care, counter-terrorism etc., where reasons for reaching a particular prediction are equally important as the prediction itself. We assess the proposed framework for interpretability with a set of desirable properties essential for any method.Comment: 7 Pages (6 + 1 for references), 7 figure

    Identify Susceptible Locations in Medical Records via Adversarial Attacks on Deep Predictive Models

    Full text link
    The surging availability of electronic medical records (EHR) leads to increased research interests in medical predictive modeling. Recently many deep learning based predicted models are also developed for EHR data and demonstrated impressive performance. However, a series of recent studies showed that these deep models are not safe: they suffer from certain vulnerabilities. In short, a well-trained deep network can be extremely sensitive to inputs with negligible changes. These inputs are referred to as adversarial examples. In the context of medical informatics, such attacks could alter the result of a high performance deep predictive model by slightly perturbing a patient's medical records. Such instability not only reflects the weakness of deep architectures, more importantly, it offers guide on detecting susceptible parts on the inputs. In this paper, we propose an efficient and effective framework that learns a time-preferential minimum attack targeting the LSTM model with EHR inputs, and we leverage this attack strategy to screen medical records of patients and identify susceptible events and measurements. The efficient screening procedure can assist decision makers to pay extra attentions to the locations that can cause severe consequence if not measured correctly. We conduct extensive empirical studies on a real-world urgent care cohort and demonstrate the effectiveness of the proposed screening approach

    MixTrain: Scalable Training of Verifiably Robust Neural Networks

    Full text link
    Making neural networks robust against adversarial inputs has resulted in an arms race between new defenses and attacks. The most promising defenses, adversarially robust training and verifiably robust training, have limitations that restrict their practical applications. The adversarially robust training only makes the networks robust against a subclass of attackers and we reveal such weaknesses by developing a new attack based on interval gradients. By contrast, verifiably robust training provides protection against any L-p norm-bounded attacker but incurs orders of magnitude more computational and memory overhead than adversarially robust training. We propose two novel techniques, stochastic robust approximation and dynamic mixed training, to drastically improve the efficiency of verifiably robust training without sacrificing verified robustness. We leverage two critical insights: (1) instead of over the entire training set, sound over-approximations over randomly subsampled training data points are sufficient for efficiently guiding the robust training process; and (2) We observe that the test accuracy and verifiable robustness often conflict after certain training epochs. Therefore, we use a dynamic loss function to adaptively balance them for each epoch. We designed and implemented our techniques as part of MixTrain and evaluated it on six networks trained on three popular datasets including MNIST, CIFAR, and ImageNet-200. Our evaluations show that MixTrain can achieve up to 95.2%95.2\% verified robust accuracy against L∞L_\infty norm-bounded attackers while taking 1515 and 33 times less training time than state-of-the-art verifiably robust training and adversarially robust training schemes, respectively. Furthermore, MixTrain easily scales to larger networks like the one trained on ImageNet-200, significantly outperforming the existing verifiably robust training methods

    Obfuscated Gradients Give a False Sense of Security: Circumventing Defenses to Adversarial Examples

    Full text link
    We identify obfuscated gradients, a kind of gradient masking, as a phenomenon that leads to a false sense of security in defenses against adversarial examples. While defenses that cause obfuscated gradients appear to defeat iterative optimization-based attacks, we find defenses relying on this effect can be circumvented. We describe characteristic behaviors of defenses exhibiting the effect, and for each of the three types of obfuscated gradients we discover, we develop attack techniques to overcome it. In a case study, examining non-certified white-box-secure defenses at ICLR 2018, we find obfuscated gradients are a common occurrence, with 7 of 9 defenses relying on obfuscated gradients. Our new attacks successfully circumvent 6 completely, and 1 partially, in the original threat model each paper considers.Comment: ICML 2018. Source code at https://github.com/anishathalye/obfuscated-gradient

    Optimal Transport Classifier: Defending Against Adversarial Attacks by Regularized Deep Embedding

    Full text link
    Recent studies have demonstrated the vulnerability of deep convolutional neural networks against adversarial examples. Inspired by the observation that the intrinsic dimension of image data is much smaller than its pixel space dimension and the vulnerability of neural networks grows with the input dimension, we propose to embed high-dimensional input images into a low-dimensional space to perform classification. However, arbitrarily projecting the input images to a low-dimensional space without regularization will not improve the robustness of deep neural networks. Leveraging optimal transport theory, we propose a new framework, Optimal Transport Classifier (OT-Classifier), and derive an objective that minimizes the discrepancy between the distribution of the true label and the distribution of the OT-Classifier output. Experimental results on several benchmark datasets show that, our proposed framework achieves state-of-the-art performance against strong adversarial attack methods.Comment: 9 page

    Thwarting finite difference adversarial attacks with output randomization

    Full text link
    Adversarial examples pose a threat to deep neural network models in a variety of scenarios, from settings where the adversary has complete knowledge of the model and to the opposite "black box" setting. Black box attacks are particularly threatening as the adversary only needs access to the input and output of the model. Defending against black box adversarial example generation attacks is paramount as currently proposed defenses are not effective. Since these types of attacks rely on repeated queries to the model to estimate gradients over input dimensions, we investigate the use of randomization to thwart such adversaries from successfully creating adversarial examples. Randomization applied to the output of the deep neural network model has the potential to confuse potential attackers, however this introduces a tradeoff between accuracy and robustness. We show that for certain types of randomization, we can bound the probability of introducing errors by carefully setting distributional parameters. For the particular case of finite difference black box attacks, we quantify the error introduced by the defense in the finite difference estimate of the gradient. Lastly, we show empirically that the defense can thwart two adaptive black box adversarial attack algorithms

    Purifying Adversarial Perturbation with Adversarially Trained Auto-encoders

    Full text link
    Machine learning models are vulnerable to adversarial examples. Iterative adversarial training has shown promising results against strong white-box attacks. However, adversarial training is very expensive, and every time a model needs to be protected, such expensive training scheme needs to be performed. In this paper, we propose to apply iterative adversarial training scheme to an external auto-encoder, which once trained can be used to protect other models directly. We empirically show that our model outperforms other purifying-based methods against white-box attacks, and transfers well to directly protect other base models with different architectures

    Adversarial Robustness vs Model Compression, or Both?

    Full text link
    It is well known that deep neural networks (DNNs) are vulnerable to adversarial attacks, which are implemented by adding crafted perturbations onto benign examples. Min-max robust optimization based adversarial training can provide a notion of security against adversarial attacks. However, adversarial robustness requires a significantly larger capacity of the network than that for the natural training with only benign examples. This paper proposes a framework of concurrent adversarial training and weight pruning that enables model compression while still preserving the adversarial robustness and essentially tackles the dilemma of adversarial training. Furthermore, this work studies two hypotheses about weight pruning in the conventional setting and finds that weight pruning is essential for reducing the network model size in the adversarial setting, training a small model from scratch even with inherited initialization from the large model cannot achieve both adversarial robustness and high standard accuracy. Code is available at https://github.com/yeshaokai/Robustness-Aware-Pruning-ADMM.Comment: Accepted by ICCV 201
    • …
    corecore