1,100 research outputs found
Improving the Robustness of Deep Neural Networks via Adversarial Training with Triplet Loss
Recent studies have highlighted that deep neural networks (DNNs) are
vulnerable to adversarial examples. In this paper, we improve the robustness of
DNNs by utilizing techniques of Distance Metric Learning. Specifically, we
incorporate Triplet Loss, one of the most popular Distance Metric Learning
methods, into the framework of adversarial training. Our proposed algorithm,
Adversarial Training with Triplet Loss (ATL), substitutes the adversarial
example against the current model for the anchor of triplet loss to effectively
smooth the classification boundary. Furthermore, we propose an ensemble version
of ATL, which aggregates different attack methods and model structures for
better defense effects. Our empirical studies verify that the proposed approach
can significantly improve the robustness of DNNs without sacrificing accuracy.
Finally, we demonstrate that our specially designed triplet loss can also be
used as a regularization term to enhance other defense methods
RAIN: A Simple Approach for Robust and Accurate Image Classification Networks
It has been shown that the majority of existing adversarial defense methods
achieve robustness at the cost of sacrificing prediction accuracy. The
undesirable severe drop in accuracy adversely affects the reliability of
machine learning algorithms and prohibits their deployment in realistic
applications. This paper aims to address this dilemma by proposing a novel
preprocessing framework, which we term Robust and Accurate Image
classificatioN(RAIN), to improve the robustness of given CNN classifiers and,
at the same time, preserve their high prediction accuracies. RAIN introduces a
new randomization-enhancement scheme. It applies randomization over inputs to
break the ties between the model forward prediction path and the backward
gradient path, thus improving the model robustness. However, similar to
existing preprocessing-based methods, the randomized process will degrade the
prediction accuracy. To understand why this is the case, we compare the
difference between original and processed images, and find it is the loss of
high-frequency components in the input image that leads to accuracy drop of the
classifier. Based on this finding, RAIN enhances the input's high-frequency
details to retain the CNN's high prediction accuracy. Concretely, RAIN consists
of two novel randomization modules: randomized small circular shift (RdmSCS)
and randomized down-upsampling (RdmDU). The RdmDU module randomly downsamples
the input image, and then the RdmSCS module circularly shifts the input image
along a randomly chosen direction by a small but random number of pixels.
Finally, the RdmDU module performs upsampling with a detail-enhancement model,
such as deep super-resolution networks. We conduct extensive experiments on the
STL10 and ImageNet datasets to verify the effectiveness of RAIN against various
types of adversarial attacks
Adversarial Attacks and Defences Competition
To accelerate research on adversarial examples and robustness of machine
learning classifiers, Google Brain organized a NIPS 2017 competition that
encouraged researchers to develop new methods to generate adversarial examples
as well as to develop new ways to defend against them. In this chapter, we
describe the structure and organization of the competition and the solutions
developed by several of the top-placing teams.Comment: 36 pages, 10 figure
Thwarting finite difference adversarial attacks with output randomization
Adversarial examples pose a threat to deep neural network models in a variety
of scenarios, from settings where the adversary has complete knowledge of the
model and to the opposite "black box" setting. Black box attacks are
particularly threatening as the adversary only needs access to the input and
output of the model. Defending against black box adversarial example generation
attacks is paramount as currently proposed defenses are not effective. Since
these types of attacks rely on repeated queries to the model to estimate
gradients over input dimensions, we investigate the use of randomization to
thwart such adversaries from successfully creating adversarial examples.
Randomization applied to the output of the deep neural network model has the
potential to confuse potential attackers, however this introduces a tradeoff
between accuracy and robustness. We show that for certain types of
randomization, we can bound the probability of introducing errors by carefully
setting distributional parameters. For the particular case of finite difference
black box attacks, we quantify the error introduced by the defense in the
finite difference estimate of the gradient. Lastly, we show empirically that
the defense can thwart two adaptive black box adversarial attack algorithms
CAAD 2018: Generating Transferable Adversarial Examples
Deep neural networks (DNNs) are vulnerable to adversarial examples,
perturbations carefully crafted to fool the targeted DNN, in both the
non-targeted and targeted case. In the non-targeted case, the attacker simply
aims to induce misclassification. In the targeted case, the attacker aims to
induce classification to a specified target class. In addition, it has been
observed that strong adversarial examples can transfer to unknown models,
yielding a serious security concern. The NIPS 2017 competition was organized to
accelerate research in adversarial attacks and defenses, taking place in the
realistic setting where submitted adversarial attacks attempt to transfer to
submitted defenses. The CAAD 2018 competition took place with nearly identical
rules to the NIPS 2017 one. Given the requirement that the NIPS 2017
submissions were to be open-sourced, participants in the CAAD 2018 competition
were able to directly build upon previous solutions, and thus improve the
state-of-the-art in this setting. Our team participated in the CAAD 2018
competition, and won 1st place in both attack subtracks, non-targeted and
targeted adversarial attacks, and 3rd place in defense. We outline our
solutions and development results in this article. We hope our results can
inform researchers in both generating and defending against adversarial
examples.Comment: 1st place attack solutions and 3rd place defense in CAAD 2018
Competitio
Defending against adversarial attacks by randomized diversification
The vulnerability of machine learning systems to adversarial attacks
questions their usage in many applications. In this paper, we propose a
randomized diversification as a defense strategy. We introduce a multi-channel
architecture in a gray-box scenario, which assumes that the architecture of the
classifier and the training data set are known to the attacker. The attacker
does not only have access to a secret key and to the internal states of the
system at the test time. The defender processes an input in multiple channels.
Each channel introduces its own randomization in a special transform domain
based on a secret key shared between the training and testing stages. Such a
transform based randomization with a shared key preserves the gradients in
key-defined sub-spaces for the defender but it prevents gradient back
propagation and the creation of various bypass systems for the attacker. An
additional benefit of multi-channel randomization is the aggregation that fuses
soft-outputs from all channels, thus increasing the reliability of the final
score. The sharing of a secret key creates an information advantage to the
defender. Experimental evaluation demonstrates an increased robustness of the
proposed method to a number of known state-of-the-art attacks
On the Security of Randomized Defenses Against Adversarial Samples
Deep Learning has been shown to be particularly vulnerable to adversarial
samples. To combat adversarial strategies, numerous defensive techniques have
been proposed. Among these, a promising approach is to use randomness in order
to make the classification process unpredictable and presumably harder for the
adversary to control. In this paper, we study the effectiveness of randomized
defenses against adversarial samples. To this end, we categorize existing
state-of-the-art adversarial strategies into three attacker models of
increasing strength, namely blackbox, graybox, and whitebox (a.k.a.~adaptive)
attackers. We also devise a lightweight randomization strategy for image
classification based on feature squeezing, that consists of pre-processing the
classifier input by embedding randomness within each feature, before applying
feature squeezing. We evaluate the proposed defense and compare it to other
randomized techniques in the literature via thorough experiments. Our results
indeed show that careful integration of randomness can be effective against
both graybox and blackbox attacks without significantly degrading the accuracy
of the underlying classifier. However, our experimental results offer strong
evidence that in the present form such randomization techniques cannot deter a
whitebox adversary that has access to all classifier parameters and has full
knowledge of the defense. Our work thoroughly and empirically analyzes the
impact of randomization techniques against all classes of adversarial
strategies
Mitigating Advanced Adversarial Attacks with More Advanced Gradient Obfuscation Techniques
Deep Neural Networks (DNNs) are well-known to be vulnerable to Adversarial
Examples (AEs). A large amount of efforts have been spent to launch and heat
the arms race between the attackers and defenders. Recently, advanced
gradient-based attack techniques were proposed (e.g., BPDA and EOT), which have
defeated a considerable number of existing defense methods. Up to today, there
are still no satisfactory solutions that can effectively and efficiently defend
against those attacks.
In this paper, we make a steady step towards mitigating those advanced
gradient-based attacks with two major contributions. First, we perform an
in-depth analysis about the root causes of those attacks, and propose four
properties that can break the fundamental assumptions of those attacks. Second,
we identify a set of operations that can meet those properties. By integrating
these operations, we design two preprocessing functions that can invalidate
these powerful attacks. Extensive evaluations indicate that our solutions can
effectively mitigate all existing standard and advanced attack techniques, and
beat 11 state-of-the-art defense solutions published in top-tier conferences
over the past 2 years. The defender can employ our solutions to constrain the
attack success rate below 7% for the strongest attacks even the adversary has
spent dozens of GPU hours
Balanced Datasets Are Not Enough: Estimating and Mitigating Gender Bias in Deep Image Representations
In this work, we present a framework to measure and mitigate intrinsic biases
with respect to protected variables --such as gender-- in visual recognition
tasks. We show that trained models significantly amplify the association of
target labels with gender beyond what one would expect from biased datasets.
Surprisingly, we show that even when datasets are balanced such that each label
co-occurs equally with each gender, learned models amplify the association
between labels and gender, as much as if data had not been balanced! To
mitigate this, we adopt an adversarial approach to remove unwanted features
corresponding to protected variables from intermediate representations in a
deep neural network -- and provide a detailed analysis of its effectiveness.
Experiments on two datasets: the COCO dataset (objects), and the imSitu dataset
(actions), show reductions in gender bias amplification while maintaining most
of the accuracy of the original models.Comment: 10 pages, 7 figures, ICCV 201
Obfuscated Gradients Give a False Sense of Security: Circumventing Defenses to Adversarial Examples
We identify obfuscated gradients, a kind of gradient masking, as a phenomenon
that leads to a false sense of security in defenses against adversarial
examples. While defenses that cause obfuscated gradients appear to defeat
iterative optimization-based attacks, we find defenses relying on this effect
can be circumvented. We describe characteristic behaviors of defenses
exhibiting the effect, and for each of the three types of obfuscated gradients
we discover, we develop attack techniques to overcome it. In a case study,
examining non-certified white-box-secure defenses at ICLR 2018, we find
obfuscated gradients are a common occurrence, with 7 of 9 defenses relying on
obfuscated gradients. Our new attacks successfully circumvent 6 completely, and
1 partially, in the original threat model each paper considers.Comment: ICML 2018. Source code at
https://github.com/anishathalye/obfuscated-gradient
- …