1,429 research outputs found
Unifying Bilateral Filtering and Adversarial Training for Robust Neural Networks
Recent analysis of deep neural networks has revealed their vulnerability to
carefully structured adversarial examples. Many effective algorithms exist to
craft these adversarial examples, but performant defenses seem to be far away.
In this work, we explore the use of edge-aware bilateral filtering as a
projection back to the space of natural images. We show that bilateral
filtering is an effective defense in multiple attack settings, where the
strength of the adversary gradually increases. In the case of an adversary who
has no knowledge of the defense, bilateral filtering can remove more than 90%
of adversarial examples from a variety of different attacks. To evaluate
against an adversary with complete knowledge of our defense, we adapt the
bilateral filter as a trainable layer in a neural network and show that adding
this layer makes ImageNet images significantly more robust to attacks. When
trained under a framework of adversarial training, we show that the resulting
model is hard to fool with even the best attack methods.Comment: 9 pages, 14 figure
Improving Network Robustness against Adversarial Attacks with Compact Convolution
Though Convolutional Neural Networks (CNNs) have surpassed human-level
performance on tasks such as object classification and face verification, they
can easily be fooled by adversarial attacks. These attacks add a small
perturbation to the input image that causes the network to misclassify the
sample. In this paper, we focus on neutralizing adversarial attacks by compact
feature learning. In particular, we show that learning features in a closed and
bounded space improves the robustness of the network. We explore the effect of
L2-Softmax Loss, that enforces compactness in the learned features, thus
resulting in enhanced robustness to adversarial perturbations. Additionally, we
propose compact convolution, a novel method of convolution that when
incorporated in conventional CNNs improves their robustness. Compact
convolution ensures feature compactness at every layer such that they are
bounded and close to each other. Extensive experiments show that Compact
Convolutional Networks (CCNs) neutralize multiple types of attacks, and perform
better than existing methods in defending adversarial attacks, without
incurring any additional training overhead compared to CNNs
FUNN: Flexible Unsupervised Neural Network
Deep neural networks have demonstrated high accuracy in image classification
tasks. However, they were shown to be weak against adversarial examples: a
small perturbation in the image which changes the classification output
dramatically. In recent years, several defenses have been proposed to solve
this issue in supervised classification tasks. We propose a method to obtain
robust features in unsupervised learning tasks against adversarial attacks. Our
method differs from existing solutions by directly learning the robust features
without the need to project the adversarial examples in the original examples
distribution space. A first auto-encoder A1 is in charge of perturbing the
input image to fool another auto-encoder A2 which is in charge of regenerating
the original image. A1 tries to find the less perturbed image under the
constraint that the error in the output of A2 should be at least equal to a
threshold. Thanks to this training, the encoder of A2 will be robust against
adversarial attacks and could be used in different tasks like classification.
Using state-of-art network architectures, we demonstrate the robustness of the
features obtained thanks to this method in classification tasks
Clipping free attacks against artificial neural networks
During the last years, a remarkable breakthrough has been made in AI domain
thanks to artificial deep neural networks that achieved a great success in many
machine learning tasks in computer vision, natural language processing, speech
recognition, malware detection and so on. However, they are highly vulnerable
to easily crafted adversarial examples. Many investigations have pointed out
this fact and different approaches have been proposed to generate attacks while
adding a limited perturbation to the original data. The most robust known
method so far is the so called C&W attack [1]. Nonetheless, a countermeasure
known as feature squeezing coupled with ensemble defense showed that most of
these attacks can be destroyed [6]. In this paper, we present a new method we
call Centered Initial Attack (CIA) whose advantage is twofold : first, it
insures by construction the maximum perturbation to be smaller than a threshold
fixed beforehand, without the clipping process that degrades the quality of
attacks. Second, it is robust against recently introduced defenses such as
feature squeezing, JPEG encoding and even against a voting ensemble of
defenses. While its application is not limited to images, we illustrate this
using five of the current best classifiers on ImageNet dataset among which two
are adversarialy retrained on purpose to be robust against attacks. With a
fixed maximum perturbation of only 1.5% on any pixel, around 80% of attacks
(targeted) fool the voting ensemble defense and nearly 100% when the
perturbation is only 6%. While this shows how it is difficult to defend against
CIA attacks, the last section of the paper gives some guidelines to limit their
impact.Comment: 12 page
EagleEye: Attack-Agnostic Defense against Adversarial Inputs (Technical Report)
Deep neural networks (DNNs) are inherently vulnerable to adversarial inputs:
such maliciously crafted samples trigger DNNs to misbehave, leading to
detrimental consequences for DNN-powered systems. The fundamental challenges of
mitigating adversarial inputs stem from their adaptive and variable nature.
Existing solutions attempt to improve DNN resilience against specific attacks;
yet, such static defenses can often be circumvented by adaptively engineered
inputs or by new attack variants.
Here, we present EagleEye, an attack-agnostic adversarial tampering analysis
engine for DNN-powered systems. Our design exploits the {\em minimality
principle} underlying many attacks: to maximize the attack's evasiveness, the
adversary often seeks the minimum possible distortion to convert genuine inputs
to adversarial ones. We show that this practice entails the distinct
distributional properties of adversarial inputs in the input space. By
leveraging such properties in a principled manner, EagleEye effectively
discriminates adversarial inputs and even uncovers their correct classification
outputs. Through extensive empirical evaluation using a range of benchmark
datasets and DNN models, we validate EagleEye's efficacy. We further
investigate the adversary's possible countermeasures, which implies a difficult
dilemma for her: to evade EagleEye's detection, excessive distortion is
necessary, thereby significantly reducing the attack's evasiveness regarding
other detection mechanisms
Enhancing Adversarial Defense by k-Winners-Take-All
We propose a simple change to existing neural network structures for better
defending against gradient-based adversarial attacks. Instead of using popular
activation functions (such as ReLU), we advocate the use of k-Winners-Take-All
(k-WTA) activation, a C0 discontinuous function that purposely invalidates the
neural network model's gradient at densely distributed input data points. The
proposed k-WTA activation can be readily used in nearly all existing networks
and training methods with no significant overhead. Our proposal is
theoretically rationalized. We analyze why the discontinuities in k-WTA
networks can largely prevent gradient-based search of adversarial examples and
why they at the same time remain innocuous to the network training. This
understanding is also empirically backed. We test k-WTA activation on various
network structures optimized by a training method, be it adversarial training
or not. In all cases, the robustness of k-WTA networks outperforms that of
traditional networks under white-box attacks
Robustness Of Saak Transform Against Adversarial Attacks
Image classification is vulnerable to adversarial attacks. This work
investigates the robustness of Saak transform against adversarial attacks
towards high performance image classification. We develop a complete image
classification system based on multi-stage Saak transform. In the Saak
transform domain, clean and adversarial images demonstrate different
distributions at different spectral dimensions. Selection of the spectral
dimensions at every stage can be viewed as an automatic denoising process.
Motivated by this observation, we carefully design strategies of feature
extraction, representation and classification that increase adversarial
robustness. The performances with well-known datasets and attacks are
demonstrated by extensive experimental evaluations
A Data-driven Adversarial Examples Recognition Framework via Adversarial Feature Genome
Convolutional neural networks (CNNs) are easily spoofed by adversarial
examples which lead to wrong classification results. Most of the defense
methods focus only on how to improve the robustness of CNNs or to detect
adversarial examples. They are incapable of detecting and correctly classifying
adversarial examples simultaneously. We find that adversarial examples and
original images have diverse representations in the feature space, and this
difference grows as layers go deeper, which we call Adversarial Feature
Separability (AFS). Inspired by AFS, we propose a defense framework based on
Adversarial Feature Genome (AFG), which can detect and correctly classify
adversarial examples into original classes simultaneously. AFG is an innovative
encoding for both image and adversarial example. It consists of group features
and a mixed label. With group features which are visual representations of
adversarial and original images via group visualization method, one can detect
adversarial examples because of ASF of group features. With a mixed label, one
can trace back to the original label of an adversarial example. Then, the
classification of adversarial example is modeled as a multi-label
classification trained on the AFG dataset, which can get the original class of
adversarial example. Experiments show that the proposed framework not only
effectively detects adversarial examples from different attack algorithms, but
also correctly classifies adversarial examples. Our framework potentially gives
a new perspective, i.e., a data-driven way, to improve the robustness of a CNN
model.Comment: 10 pages, 5 figures, 8 table
The Adversarial Attack and Detection under the Fisher Information Metric
Many deep learning models are vulnerable to the adversarial attack, i.e.,
imperceptible but intentionally-designed perturbations to the input can cause
incorrect output of the networks. In this paper, using information geometry, we
provide a reasonable explanation for the vulnerability of deep learning models.
By considering the data space as a non-linear space with the Fisher information
metric induced from a neural network, we first propose an adversarial attack
algorithm termed one-step spectral attack (OSSA). The method is described by a
constrained quadratic form of the Fisher information matrix, where the optimal
adversarial perturbation is given by the first eigenvector, and the model
vulnerability is reflected by the eigenvalues. The larger an eigenvalue is, the
more vulnerable the model is to be attacked by the corresponding eigenvector.
Taking advantage of the property, we also propose an adversarial detection
method with the eigenvalues serving as characteristics. Both our attack and
detection algorithms are numerically optimized to work efficiently on large
datasets. Our evaluations show superior performance compared with other
methods, implying that the Fisher information is a promising approach to
investigate the adversarial attacks and defenses.Comment: Accepted as an AAAI-2019 oral pape
Robust Sparse Regularization: Simultaneously Optimizing Neural Network Robustness and Compactness
Deep Neural Network (DNN) trained by the gradient descent method is known to
be vulnerable to maliciously perturbed adversarial input, aka. adversarial
attack. As one of the countermeasures against adversarial attack, increasing
the model capacity for DNN robustness enhancement was discussed and reported as
an effective approach by many recent works. In this work, we show that
shrinking the model size through proper weight pruning can even be helpful to
improve the DNN robustness under adversarial attack. For obtaining a
simultaneously robust and compact DNN model, we propose a multi-objective
training method called Robust Sparse Regularization (RSR), through the fusion
of various regularization techniques, including channel-wise noise injection,
lasso weight penalty, and adversarial training. We conduct extensive
experiments across popular ResNet-20, ResNet-18 and VGG-16 DNN architectures to
demonstrate the effectiveness of RSR against popular white-box (i.e., PGD and
FGSM) and black-box attacks. Thanks to RSR, 85% weight connections of ResNet-18
can be pruned while still achieving 0.68% and 8.72% improvement in clean- and
perturbed-data accuracy respectively on CIFAR-10 dataset, in comparison to its
PGD adversarial training baseline
- …