1,380 research outputs found
Combatting Adversarial Attacks through Denoising and Dimensionality Reduction: A Cascaded Autoencoder Approach
Machine Learning models are vulnerable to adversarial attacks that rely on
perturbing the input data. This work proposes a novel strategy using
Autoencoder Deep Neural Networks to defend a machine learning model against two
gradient-based attacks: The Fast Gradient Sign attack and Fast Gradient attack.
First we use an autoencoder to denoise the test data, which is trained with
both clean and corrupted data. Then, we reduce the dimension of the denoised
data using the hidden layer representation of another autoencoder. We perform
this experiment for multiple values of the bound of adversarial perturbations,
and consider different numbers of reduced dimensions. When the test data is
preprocessed using this cascaded pipeline, the tested deep neural network
classifier yields a much higher accuracy, thus mitigating the effect of the
adversarial perturbation.Comment: 7 pages, 8 figures, submitted to Conference on Information Sciences
and Systems (CISS 2019
CAAD 2018: Generating Transferable Adversarial Examples
Deep neural networks (DNNs) are vulnerable to adversarial examples,
perturbations carefully crafted to fool the targeted DNN, in both the
non-targeted and targeted case. In the non-targeted case, the attacker simply
aims to induce misclassification. In the targeted case, the attacker aims to
induce classification to a specified target class. In addition, it has been
observed that strong adversarial examples can transfer to unknown models,
yielding a serious security concern. The NIPS 2017 competition was organized to
accelerate research in adversarial attacks and defenses, taking place in the
realistic setting where submitted adversarial attacks attempt to transfer to
submitted defenses. The CAAD 2018 competition took place with nearly identical
rules to the NIPS 2017 one. Given the requirement that the NIPS 2017
submissions were to be open-sourced, participants in the CAAD 2018 competition
were able to directly build upon previous solutions, and thus improve the
state-of-the-art in this setting. Our team participated in the CAAD 2018
competition, and won 1st place in both attack subtracks, non-targeted and
targeted adversarial attacks, and 3rd place in defense. We outline our
solutions and development results in this article. We hope our results can
inform researchers in both generating and defending against adversarial
examples.Comment: 1st place attack solutions and 3rd place defense in CAAD 2018
Competitio
Adversarial Learning in Statistical Classification: A Comprehensive Review of Defenses Against Attacks
There is great potential for damage from adversarial learning (AL) attacks on
machine-learning based systems. In this paper, we provide a contemporary survey
of AL, focused particularly on defenses against attacks on statistical
classifiers. After introducing relevant terminology and the goals and range of
possible knowledge of both attackers and defenders, we survey recent work on
test-time evasion (TTE), data poisoning (DP), and reverse engineering (RE)
attacks and particularly defenses against same. In so doing, we distinguish
robust classification from anomaly detection (AD), unsupervised from
supervised, and statistical hypothesis-based defenses from ones that do not
have an explicit null (no attack) hypothesis; we identify the hyperparameters a
particular method requires, its computational complexity, as well as the
performance measures on which it was evaluated and the obtained quality. We
then dig deeper, providing novel insights that challenge conventional AL wisdom
and that target unresolved issues, including: 1) robust classification versus
AD as a defense strategy; 2) the belief that attack success increases with
attack strength, which ignores susceptibility to AD; 3) small perturbations for
test-time evasion attacks: a fallacy or a requirement?; 4) validity of the
universal assumption that a TTE attacker knows the ground-truth class for the
example to be attacked; 5) black, grey, or white box attacks as the standard
for defense evaluation; 6) susceptibility of query-based RE to an AD defense.
We also discuss attacks on the privacy of training data. We then present
benchmark comparisons of several defenses against TTE, RE, and backdoor DP
attacks on images. The paper concludes with a discussion of future work
MaskDGA: A Black-box Evasion Technique Against DGA Classifiers and Adversarial Defenses
Domain generation algorithms (DGAs) are commonly used by botnets to generate
domain names through which bots can establish a resilient communication channel
with their command and control servers. Recent publications presented deep
learning, character-level classifiers that are able to detect algorithmically
generated domain (AGD) names with high accuracy, and correspondingly,
significantly reduce the effectiveness of DGAs for botnet communication. In
this paper we present MaskDGA, a practical adversarial learning technique that
adds perturbation to the character-level representation of algorithmically
generated domain names in order to evade DGA classifiers, without the attacker
having any knowledge about the DGA classifier's architecture and parameters.
MaskDGA was evaluated using the DMD-2018 dataset of AGD names and four recently
published DGA classifiers, in which the average F1-score of the classifiers
degrades from 0.977 to 0.495 when applying the evasion technique. An additional
evaluation was conducted using the same classifiers but with adversarial
defenses implemented: adversarial re-training and distillation. The results of
this evaluation show that MaskDGA can be used for improving the robustness of
the character-level DGA classifiers against adversarial attacks, but that
ideally DGA classifiers should incorporate additional features alongside
character-level features that are demonstrated in this study to be vulnerable
to adversarial attacks.Comment: 12 pages, 2 figure
Adversarial Attacks and Defences Competition
To accelerate research on adversarial examples and robustness of machine
learning classifiers, Google Brain organized a NIPS 2017 competition that
encouraged researchers to develop new methods to generate adversarial examples
as well as to develop new ways to defend against them. In this chapter, we
describe the structure and organization of the competition and the solutions
developed by several of the top-placing teams.Comment: 36 pages, 10 figure
Universal Rules for Fooling Deep Neural Networks based Text Classification
Recently, deep learning based natural language processing techniques are
being extensively used to deal with spam mail, censorship evaluation in social
networks, among others. However, there is only a couple of works evaluating the
vulnerabilities of such deep neural networks. Here, we go beyond attacks to
investigate, for the first time, universal rules, i.e., rules that are sample
agnostic and therefore could turn any text sample in an adversarial one. In
fact, the universal rules do not use any information from the method itself (no
information from the method, gradient information or training dataset
information is used), making them black-box universal attacks. In other words,
the universal rules are sample and method agnostic. By proposing a
coevolutionary optimization algorithm we show that it is possible to create
universal rules that can automatically craft imperceptible adversarial samples
(only less than five perturbations which are close to misspelling are inserted
in the text sample). A comparison with a random search algorithm further
justifies the strength of the method. Thus, universal rules for fooling
networks are here shown to exist. Hopefully, the results from this work will
impact the development of yet more sample and model agnostic attacks as well as
their defenses, culminating in perhaps a new age for artificial intelligence
Adversarial Examples: Opportunities and Challenges
Deep neural networks (DNNs) have shown huge superiority over humans in image
recognition, speech processing, autonomous vehicles and medical diagnosis.
However, recent studies indicate that DNNs are vulnerable to adversarial
examples (AEs), which are designed by attackers to fool deep learning models.
Different from real examples, AEs can mislead the model to predict incorrect
outputs while hardly be distinguished by human eyes, therefore threaten
security-critical deep-learning applications. In recent years, the generation
and defense of AEs have become a research hotspot in the field of artificial
intelligence (AI) security. This article reviews the latest research progress
of AEs. First, we introduce the concept, cause, characteristics and evaluation
metrics of AEs, then give a survey on the state-of-the-art AE generation
methods with the discussion of advantages and disadvantages. After that, we
review the existing defenses and discuss their limitations. Finally, future
research opportunities and challenges on AEs are prospected.Comment: 16 pages, 13 figures, 5 table
Detecting Adversarial Perturbations Through Spatial Behavior in Activation Spaces
Neural network based classifiers are still prone to manipulation through
adversarial perturbations. State of the art attacks can overcome most of the
defense or detection mechanisms suggested so far, and adversaries have the
upper hand in this arms race. Adversarial examples are designed to resemble the
normal input from which they were constructed, while triggering an incorrect
classification. This basic design goal leads to a characteristic spatial
behavior within the context of Activation Spaces, a term coined by the authors
to refer to the hyperspaces formed by the activation values of the network's
layers. Within the output of the first layers of the network, an adversarial
example is likely to resemble normal instances of the source class, while in
the final layers such examples will diverge towards the adversary's target
class. The steps below enable us to leverage this inherent shift from one class
to another in order to form a novel adversarial example detector. We construct
Euclidian spaces out of the activation values of each of the deep neural
network layers. Then, we induce a set of k-nearest neighbor classifiers (k-NN),
one per activation space of each neural network layer, using the
non-adversarial examples. We leverage those classifiers to produce a sequence
of class labels for each nonperturbed input sample and estimate the a priori
probability for a class label change between one activation space and another.
During the detection phase we compute a sequence of classification labels for
each input using the trained classifiers. We then estimate the likelihood of
those classification sequences and show that adversarial sequences are far less
likely than normal ones. We evaluated our detection method against the state of
the art C&W attack method, using two image classification datasets (MNIST,
CIFAR-10) reaching an AUC 0f 0.95 for the CIFAR-10 dataset
DARTS: Deceiving Autonomous Cars with Toxic Signs
Sign recognition is an integral part of autonomous cars. Any
misclassification of traffic signs can potentially lead to a multitude of
disastrous consequences, ranging from a life-threatening accident to even a
large-scale interruption of transportation services relying on autonomous cars.
In this paper, we propose and examine security attacks against sign recognition
systems for Deceiving Autonomous caRs with Toxic Signs (we call the proposed
attacks DARTS). In particular, we introduce two novel methods to create these
toxic signs. First, we propose Out-of-Distribution attacks, which expand the
scope of adversarial examples by enabling the adversary to generate these
starting from an arbitrary point in the image space compared to prior attacks
which are restricted to existing training/test data (In-Distribution). Second,
we present the Lenticular Printing attack, which relies on an optical
phenomenon to deceive the traffic sign recognition system. We extensively
evaluate the effectiveness of the proposed attacks in both virtual and
real-world settings and consider both white-box and black-box threat models.
Our results demonstrate that the proposed attacks are successful under both
settings and threat models. We further show that Out-of-Distribution attacks
can outperform In-Distribution attacks on classifiers defended using the
adversarial training defense, exposing a new attack vector for these defenses.Comment: Submitted to ACM CCS 2018; Extended version of [1801.02780] Rogue
Signs: Deceiving Traffic Sign Recognition with Malicious Ads and Logo
Better the Devil you Know: An Analysis of Evasion Attacks using Out-of-Distribution Adversarial Examples
A large body of recent work has investigated the phenomenon of evasion
attacks using adversarial examples for deep learning systems, where the
addition of norm-bounded perturbations to the test inputs leads to incorrect
output classification. Previous work has investigated this phenomenon in
closed-world systems where training and test inputs follow a pre-specified
distribution. However, real-world implementations of deep learning
applications, such as autonomous driving and content classification are likely
to operate in the open-world environment. In this paper, we demonstrate the
success of open-world evasion attacks, where adversarial examples are generated
from out-of-distribution inputs (OOD adversarial examples). In our study, we
use 11 state-of-the-art neural network models trained on 3 image datasets of
varying complexity. We first demonstrate that state-of-the-art detectors for
out-of-distribution data are not robust against OOD adversarial examples. We
then consider 5 known defenses for adversarial examples, including
state-of-the-art robust training methods, and show that against these defenses,
OOD adversarial examples can achieve up to 4 higher target success
rates compared to adversarial examples generated from in-distribution data. We
also take a quantitative look at how open-world evasion attacks may affect
real-world systems. Finally, we present the first steps towards a robust
open-world machine learning system.Comment: 18 pages, 5 figures, 9 table
- …