1,612 research outputs found
Towards Leveraging the Information of Gradients in Optimization-based Adversarial Attack
In recent years, deep neural networks demonstrated state-of-the-art
performance in a large variety of tasks and therefore have been adopted in many
applications. On the other hand, the latest studies revealed that neural
networks are vulnerable to adversarial examples obtained by carefully adding
small perturbation to legitimate samples. Based upon the observation, many
attack methods were proposed. Among them, the optimization-based CW attack is
the most powerful as the produced adversarial samples present much less
distortion compared to other methods. The better attacking effect, however,
comes at the cost of running more iterations and thus longer computation time
to reach desirable results. In this work, we propose to leverage the
information of gradients as a guidance during the search of adversaries. More
specifically, directly incorporating the gradients into the perturbation can be
regarded as a constraint added to the optimization process. We intuitively and
empirically prove the rationality of our method in reducing the search space.
Our experiments show that compared to the original CW attack, the proposed
method requires fewer iterations towards adversarial samples, obtaining a
higher success rate and resulting in smaller distortion
Adversarial Examples: Attacks and Defenses for Deep Learning
With rapid progress and significant successes in a wide spectrum of
applications, deep learning is being applied in many safety-critical
environments. However, deep neural networks have been recently found vulnerable
to well-designed input samples, called adversarial examples. Adversarial
examples are imperceptible to human but can easily fool deep neural networks in
the testing/deploying stage. The vulnerability to adversarial examples becomes
one of the major risks for applying deep neural networks in safety-critical
environments. Therefore, attacks and defenses on adversarial examples draw
great attention. In this paper, we review recent findings on adversarial
examples for deep neural networks, summarize the methods for generating
adversarial examples, and propose a taxonomy of these methods. Under the
taxonomy, applications for adversarial examples are investigated. We further
elaborate on countermeasures for adversarial examples and explore the
challenges and the potential solutions.Comment: Github: https://github.com/chbrian/awesome-adversarial-examples-d
TSViz: Demystification of Deep Learning Models for Time-Series Analysis
This paper presents a novel framework for demystification of convolutional
deep learning models for time-series analysis. This is a step towards making
informed/explainable decisions in the domain of time-series, powered by deep
learning. There have been numerous efforts to increase the interpretability of
image-centric deep neural network models, where the learned features are more
intuitive to visualize. Visualization in time-series domain is much more
complicated as there is no direct interpretation of the filters and inputs as
compared to the image modality. In addition, little or no concentration has
been devoted for the development of such tools in the domain of time-series in
the past. TSViz provides possibilities to explore and analyze a network from
different dimensions at different levels of abstraction which includes
identification of parts of the input that were responsible for a prediction
(including per filter saliency), importance of different filters present in the
network for a particular prediction, notion of diversity present in the network
through filter clustering, understanding of the main sources of variation
learnt by the network through inverse optimization, and analysis of the
network's robustness against adversarial noise. As a sanity check for the
computed influence values, we demonstrate results regarding pruning of neural
networks based on the computed influence information. These representations
allow to understand the network features so that the acceptability of deep
networks for time-series data can be enhanced. This is extremely important in
domains like finance, industry 4.0, self-driving cars, health-care,
counter-terrorism etc., where reasons for reaching a particular prediction are
equally important as the prediction itself. We assess the proposed framework
for interpretability with a set of desirable properties essential for any
method.Comment: 7 Pages (6 + 1 for references), 7 figure
Identify Susceptible Locations in Medical Records via Adversarial Attacks on Deep Predictive Models
The surging availability of electronic medical records (EHR) leads to
increased research interests in medical predictive modeling. Recently many deep
learning based predicted models are also developed for EHR data and
demonstrated impressive performance. However, a series of recent studies showed
that these deep models are not safe: they suffer from certain vulnerabilities.
In short, a well-trained deep network can be extremely sensitive to inputs with
negligible changes. These inputs are referred to as adversarial examples. In
the context of medical informatics, such attacks could alter the result of a
high performance deep predictive model by slightly perturbing a patient's
medical records. Such instability not only reflects the weakness of deep
architectures, more importantly, it offers guide on detecting susceptible parts
on the inputs. In this paper, we propose an efficient and effective framework
that learns a time-preferential minimum attack targeting the LSTM model with
EHR inputs, and we leverage this attack strategy to screen medical records of
patients and identify susceptible events and measurements. The efficient
screening procedure can assist decision makers to pay extra attentions to the
locations that can cause severe consequence if not measured correctly. We
conduct extensive empirical studies on a real-world urgent care cohort and
demonstrate the effectiveness of the proposed screening approach
MixTrain: Scalable Training of Verifiably Robust Neural Networks
Making neural networks robust against adversarial inputs has resulted in an
arms race between new defenses and attacks. The most promising defenses,
adversarially robust training and verifiably robust training, have limitations
that restrict their practical applications. The adversarially robust training
only makes the networks robust against a subclass of attackers and we reveal
such weaknesses by developing a new attack based on interval gradients. By
contrast, verifiably robust training provides protection against any L-p
norm-bounded attacker but incurs orders of magnitude more computational and
memory overhead than adversarially robust training.
We propose two novel techniques, stochastic robust approximation and dynamic
mixed training, to drastically improve the efficiency of verifiably robust
training without sacrificing verified robustness. We leverage two critical
insights: (1) instead of over the entire training set, sound
over-approximations over randomly subsampled training data points are
sufficient for efficiently guiding the robust training process; and (2) We
observe that the test accuracy and verifiable robustness often conflict after
certain training epochs. Therefore, we use a dynamic loss function to
adaptively balance them for each epoch.
We designed and implemented our techniques as part of MixTrain and evaluated
it on six networks trained on three popular datasets including MNIST, CIFAR,
and ImageNet-200. Our evaluations show that MixTrain can achieve up to
verified robust accuracy against norm-bounded attackers while taking
and times less training time than state-of-the-art verifiably robust
training and adversarially robust training schemes, respectively. Furthermore,
MixTrain easily scales to larger networks like the one trained on ImageNet-200,
significantly outperforming the existing verifiably robust training methods
Obfuscated Gradients Give a False Sense of Security: Circumventing Defenses to Adversarial Examples
We identify obfuscated gradients, a kind of gradient masking, as a phenomenon
that leads to a false sense of security in defenses against adversarial
examples. While defenses that cause obfuscated gradients appear to defeat
iterative optimization-based attacks, we find defenses relying on this effect
can be circumvented. We describe characteristic behaviors of defenses
exhibiting the effect, and for each of the three types of obfuscated gradients
we discover, we develop attack techniques to overcome it. In a case study,
examining non-certified white-box-secure defenses at ICLR 2018, we find
obfuscated gradients are a common occurrence, with 7 of 9 defenses relying on
obfuscated gradients. Our new attacks successfully circumvent 6 completely, and
1 partially, in the original threat model each paper considers.Comment: ICML 2018. Source code at
https://github.com/anishathalye/obfuscated-gradient
Optimal Transport Classifier: Defending Against Adversarial Attacks by Regularized Deep Embedding
Recent studies have demonstrated the vulnerability of deep convolutional
neural networks against adversarial examples. Inspired by the observation that
the intrinsic dimension of image data is much smaller than its pixel space
dimension and the vulnerability of neural networks grows with the input
dimension, we propose to embed high-dimensional input images into a
low-dimensional space to perform classification. However, arbitrarily
projecting the input images to a low-dimensional space without regularization
will not improve the robustness of deep neural networks. Leveraging optimal
transport theory, we propose a new framework, Optimal Transport Classifier
(OT-Classifier), and derive an objective that minimizes the discrepancy between
the distribution of the true label and the distribution of the OT-Classifier
output. Experimental results on several benchmark datasets show that, our
proposed framework achieves state-of-the-art performance against strong
adversarial attack methods.Comment: 9 page
Thwarting finite difference adversarial attacks with output randomization
Adversarial examples pose a threat to deep neural network models in a variety
of scenarios, from settings where the adversary has complete knowledge of the
model and to the opposite "black box" setting. Black box attacks are
particularly threatening as the adversary only needs access to the input and
output of the model. Defending against black box adversarial example generation
attacks is paramount as currently proposed defenses are not effective. Since
these types of attacks rely on repeated queries to the model to estimate
gradients over input dimensions, we investigate the use of randomization to
thwart such adversaries from successfully creating adversarial examples.
Randomization applied to the output of the deep neural network model has the
potential to confuse potential attackers, however this introduces a tradeoff
between accuracy and robustness. We show that for certain types of
randomization, we can bound the probability of introducing errors by carefully
setting distributional parameters. For the particular case of finite difference
black box attacks, we quantify the error introduced by the defense in the
finite difference estimate of the gradient. Lastly, we show empirically that
the defense can thwart two adaptive black box adversarial attack algorithms
Purifying Adversarial Perturbation with Adversarially Trained Auto-encoders
Machine learning models are vulnerable to adversarial examples. Iterative
adversarial training has shown promising results against strong white-box
attacks. However, adversarial training is very expensive, and every time a
model needs to be protected, such expensive training scheme needs to be
performed. In this paper, we propose to apply iterative adversarial training
scheme to an external auto-encoder, which once trained can be used to protect
other models directly. We empirically show that our model outperforms other
purifying-based methods against white-box attacks, and transfers well to
directly protect other base models with different architectures
Adversarial Robustness vs Model Compression, or Both?
It is well known that deep neural networks (DNNs) are vulnerable to
adversarial attacks, which are implemented by adding crafted perturbations onto
benign examples. Min-max robust optimization based adversarial training can
provide a notion of security against adversarial attacks. However, adversarial
robustness requires a significantly larger capacity of the network than that
for the natural training with only benign examples. This paper proposes a
framework of concurrent adversarial training and weight pruning that enables
model compression while still preserving the adversarial robustness and
essentially tackles the dilemma of adversarial training. Furthermore, this work
studies two hypotheses about weight pruning in the conventional setting and
finds that weight pruning is essential for reducing the network model size in
the adversarial setting, training a small model from scratch even with
inherited initialization from the large model cannot achieve both adversarial
robustness and high standard accuracy. Code is available at
https://github.com/yeshaokai/Robustness-Aware-Pruning-ADMM.Comment: Accepted by ICCV 201
- …