8 research outputs found
Defensive Dropout for Hardening Deep Neural Networks under Adversarial Attacks
Deep neural networks (DNNs) are known vulnerable to adversarial attacks. That
is, adversarial examples, obtained by adding delicately crafted distortions
onto original legal inputs, can mislead a DNN to classify them as any target
labels. This work provides a solution to hardening DNNs under adversarial
attacks through defensive dropout. Besides using dropout during training for
the best test accuracy, we propose to use dropout also at test time to achieve
strong defense effects. We consider the problem of building robust DNNs as an
attacker-defender two-player game, where the attacker and the defender know
each others' strategies and try to optimize their own strategies towards an
equilibrium. Based on the observations of the effect of test dropout rate on
test accuracy and attack success rate, we propose a defensive dropout algorithm
to determine an optimal test dropout rate given the neural network model and
the attacker's strategy for generating adversarial examples.We also investigate
the mechanism behind the outstanding defense effects achieved by the proposed
defensive dropout. Comparing with stochastic activation pruning (SAP), another
defense method through introducing randomness into the DNN model, we find that
our defensive dropout achieves much larger variances of the gradients, which is
the key for the improved defense effects (much lower attack success rate). For
example, our defensive dropout can reduce the attack success rate from 100% to
13.89% under the currently strongest attack i.e., C&W attack on MNIST dataset.Comment: Accepted as conference paper on ICCAD 201
Explaining Classifiers using Adversarial Perturbations on the Perceptual Ball
We present a simple regularization of adversarial perturbations based upon
the perceptual loss. While the resulting perturbations remain imperceptible to
the human eye, they differ from existing adversarial perturbations in that they
are semi-sparse alterations that highlight objects and regions of interest
while leaving the background unaltered. As a semantically meaningful adverse
perturbations, it forms a bridge between counterfactual explanations and
adversarial perturbations in the space of images. We evaluate our approach on
several standard explainability benchmarks, namely, weak localization,
insertion deletion, and the pointing game demonstrating that perceptually
regularized counterfactuals are an effective explanation for image-based
classifiers.Comment: CVPR 202
Addressing Neural Network Robustness with Mixup and Targeted Labeling Adversarial Training
Despite their performance, Artificial Neural Networks are not reliable enough
for most of industrial applications. They are sensitive to noises, rotations,
blurs and adversarial examples. There is a need to build defenses that protect
against a wide range of perturbations, covering the most traditional common
corruptions and adversarial examples. We propose a new data augmentation
strategy called M-TLAT and designed to address robustness in a broad sense. Our
approach combines the Mixup augmentation and a new adversarial training
algorithm called Targeted Labeling Adversarial Training (TLAT). The idea of
TLAT is to interpolate the target labels of adversarial examples with the
ground-truth labels. We show that M-TLAT can increase the robustness of image
classifiers towards nineteen common corruptions and five adversarial attacks,
without reducing the accuracy on clean samples
Towards Query-Efficient Black-Box Adversary with Zeroth-Order Natural Gradient Descent
Despite the great achievements of the modern deep neural networks (DNNs), the
vulnerability/robustness of state-of-the-art DNNs raises security concerns in
many application domains requiring high reliability. Various adversarial
attacks are proposed to sabotage the learning performance of DNN models. Among
those, the black-box adversarial attack methods have received special
attentions owing to their practicality and simplicity. Black-box attacks
usually prefer less queries in order to maintain stealthy and low costs.
However, most of the current black-box attack methods adopt the first-order
gradient descent method, which may come with certain deficiencies such as
relatively slow convergence and high sensitivity to hyper-parameter settings.
In this paper, we propose a zeroth-order natural gradient descent (ZO-NGD)
method to design the adversarial attacks, which incorporates the zeroth-order
gradient estimation technique catering to the black-box attack scenario and the
second-order natural gradient descent to achieve higher query efficiency. The
empirical evaluations on image classification datasets demonstrate that ZO-NGD
can obtain significantly lower model query complexities compared with
state-of-the-art attack methods.Comment: accepted by AAAI 202
Random Spiking and Systematic Evaluation of Defenses Against Adversarial Examples
Image classifiers often suffer from adversarial examples, which are generated
by strategically adding a small amount of noise to input images to trick
classifiers into misclassification. Over the years, many defense mechanisms
have been proposed, and different researchers have made seemingly contradictory
claims on their effectiveness. We present an analysis of possible adversarial
models, and propose an evaluation framework for comparing different defense
mechanisms. As part of the framework, we introduce a more powerful and
realistic adversary strategy. Furthermore, we propose a new defense mechanism
called Random Spiking (RS), which generalizes dropout and introduces random
noises in the training process in a controlled manner. Evaluations under our
proposed framework suggest RS delivers better protection against adversarial
examples than many existing schemes.Comment: To be appear in ACM CODESPY 202
TOWARDS ROBUST REPRESENTATION LEARNING AND BEYOND
Deep networks have reshaped the computer vision research in recent years. As fueled by powerful computational resources and massive amount of data, deep networks now dominate a wide range of visual benchmarks. Nonetheless, these success stories come with bitterness---an increasing amount of studies has shown the limitations of deep networks on certain testing conditions like small input changes or occlusion. These failures not only raise safety and reliability concerns on the applicability of deep networks in the real world, but also demonstrate the computations performed by the current deep networks are dramatically different from those by human brains.
In this dissertation, we focus on investigating and tackling a particular yet challenging weakness of deep networks---their vulnerability to adversarial examples. The first part of this thesis argues that such vulnerability is a much more severe issue than we thought---the threats from adversarial examples are ubiquitous and catastrophic. We then discuss how to equip deep networks with robust representations for defending against adversarial examples. We approach the solution from the perspective of neural architecture design, and show incorporating architectural elements like feature-level denoisers or smooth activation functions can effectively boost model robustness. The last part of this thesis focuses on rethinking the value of adversarial examples. Rather than treating adversarial examples as a threat to deep networks, we take a further step on uncovering adversarial examples can help deep networks improve the generalization ability, if feature representations are properly disentangled during learning