11,898 research outputs found
Detecting Adversarial Perturbations Through Spatial Behavior in Activation Spaces
Neural network based classifiers are still prone to manipulation through
adversarial perturbations. State of the art attacks can overcome most of the
defense or detection mechanisms suggested so far, and adversaries have the
upper hand in this arms race. Adversarial examples are designed to resemble the
normal input from which they were constructed, while triggering an incorrect
classification. This basic design goal leads to a characteristic spatial
behavior within the context of Activation Spaces, a term coined by the authors
to refer to the hyperspaces formed by the activation values of the network's
layers. Within the output of the first layers of the network, an adversarial
example is likely to resemble normal instances of the source class, while in
the final layers such examples will diverge towards the adversary's target
class. The steps below enable us to leverage this inherent shift from one class
to another in order to form a novel adversarial example detector. We construct
Euclidian spaces out of the activation values of each of the deep neural
network layers. Then, we induce a set of k-nearest neighbor classifiers (k-NN),
one per activation space of each neural network layer, using the
non-adversarial examples. We leverage those classifiers to produce a sequence
of class labels for each nonperturbed input sample and estimate the a priori
probability for a class label change between one activation space and another.
During the detection phase we compute a sequence of classification labels for
each input using the trained classifiers. We then estimate the likelihood of
those classification sequences and show that adversarial sequences are far less
likely than normal ones. We evaluated our detection method against the state of
the art C&W attack method, using two image classification datasets (MNIST,
CIFAR-10) reaching an AUC 0f 0.95 for the CIFAR-10 dataset
Why is the Mahalanobis Distance Effective for Anomaly Detection?
The Mahalanobis distance-based confidence score, a recently proposed anomaly
detection method for pre-trained neural classifiers, achieves state-of-the-art
performance on both out-of-distribution (OoD) and adversarial examples
detection. This work analyzes why this method exhibits such strong performance
in practical settings while imposing an implausible assumption; namely, that
class conditional distributions of pre-trained features have tied covariance.
Although the Mahalanobis distance-based method is claimed to be motivated by
classification prediction confidence, we find that its superior performance
stems from information not useful for classification. This suggests that the
reason the Mahalanobis confidence score works so well is mistaken, and makes
use of different information from ODIN, another popular OoD detection method
based on prediction confidence. This perspective motivates us to combine these
two methods, and the combined detector exhibits improved performance and
robustness. These findings provide insight into the behavior of neural
classifiers in response to anomalous inputs
Adversarial Attack Type I: Cheat Classifiers by Significant Changes
Despite the great success of deep neural networks, the adversarial attack can
cheat some well-trained classifiers by small permutations. In this paper, we
propose another type of adversarial attack that can cheat classifiers by
significant changes. For example, we can significantly change a face but
well-trained neural networks still recognize the adversarial and the original
example as the same person. Statistically, the existing adversarial attack
increases Type II error and the proposed one aims at Type I error, which are
hence named as Type II and Type I adversarial attack, respectively. The two
types of attack are equally important but are essentially different, which are
intuitively explained and numerically evaluated. To implement the proposed
attack, a supervised variation autoencoder is designed and then the classifier
is attacked by updating the latent variables using gradient information.
{Besides, with pre-trained generative models, Type I attack on latent spaces is
investigated as well.} Experimental results show that our method is practical
and effective to generate Type I adversarial examples on large-scale image
datasets. Most of these generated examples can pass detectors designed for
defending Type II attack and the strengthening strategy is only efficient with
a specific type attack, both implying that the underlying reasons for Type I
and Type II attack are different
Adversarial Feature Selection against Evasion Attacks
Pattern recognition and machine learning techniques have been increasingly
adopted in adversarial settings such as spam, intrusion and malware detection,
although their security against well-crafted attacks that aim to evade
detection by manipulating data at test time has not yet been thoroughly
assessed. While previous work has been mainly focused on devising
adversary-aware classification algorithms to counter evasion attempts, only few
authors have considered the impact of using reduced feature sets on classifier
security against the same attacks. An interesting, preliminary result is that
classifier security to evasion may be even worsened by the application of
feature selection. In this paper, we provide a more detailed investigation of
this aspect, shedding some light on the security properties of feature
selection against evasion attacks. Inspired by previous work on adversary-aware
classifiers, we propose a novel adversary-aware feature selection model that
can improve classifier security against evasion attacks, by incorporating
specific assumptions on the adversary's data manipulation strategy. We focus on
an efficient, wrapper-based implementation of our approach, and experimentally
validate its soundness on different application examples, including spam and
malware detection
Adversarial Examples - A Complete Characterisation of the Phenomenon
We provide a complete characterisation of the phenomenon of adversarial
examples - inputs intentionally crafted to fool machine learning models. We aim
to cover all the important concerns in this field of study: (1) the conjectures
on the existence of adversarial examples, (2) the security, safety and
robustness implications, (3) the methods used to generate and (4) protect
against adversarial examples and (5) the ability of adversarial examples to
transfer between different machine learning models. We provide ample background
information in an effort to make this document self-contained. Therefore, this
document can be used as survey, tutorial or as a catalog of attacks and
defences using adversarial examples
A Robust Approach for Securing Audio Classification Against Adversarial Attacks
Adversarial audio attacks can be considered as a small perturbation
unperceptive to human ears that is intentionally added to the audio signal and
causes a machine learning model to make mistakes. This poses a security concern
about the safety of machine learning models since the adversarial attacks can
fool such models toward the wrong predictions. In this paper we first review
some strong adversarial attacks that may affect both audio signals and their 2D
representations and evaluate the resiliency of the most common machine learning
model, namely deep learning models and support vector machines (SVM) trained on
2D audio representations such as short time Fourier transform (STFT), discrete
wavelet transform (DWT) and cross recurrent plot (CRP) against several
state-of-the-art adversarial attacks. Next, we propose a novel approach based
on pre-processed DWT representation of audio signals and SVM to secure audio
systems against adversarial attacks. The proposed architecture has several
preprocessing modules for generating and enhancing spectrograms including
dimension reduction and smoothing. We extract features from small patches of
the spectrograms using speeded up robust feature (SURF) algorithm which are
further used to generate a codebook using the K-Means++ algorithm. Finally,
codewords are used to train a SVM on the codebook of the SURF-generated
vectors. All these steps yield to a novel approach for audio classification
that provides a good trade-off between accuracy and resilience. Experimental
results on three environmental sound datasets show the competitive performance
of proposed approach compared to the deep neural networks both in terms of
accuracy and robustness against strong adversarial attacks.Comment: Paper Accepted for Publication in IEEE Transactions on Information
Forensics and Securit
Adversary Detection in Neural Networks via Persistent Homology
We outline a detection method for adversarial inputs to deep neural networks.
By viewing neural network computations as graphs upon which information flows
from input space to out- put distribution, we compare the differences in graphs
induced by different inputs. Specifically, by applying persistent homology to
these induced graphs, we observe that the structure of the most persistent
subgraphs which generate the first homology group differ between adversarial
and unperturbed inputs. Based on this observation, we build a detection
algorithm that depends only on the topological information extracted during
training. We test our algorithm on MNIST and achieve 98% detection adversary
accuracy with F1-score 0.98.Comment: 16 page
HashTran-DNN: A Framework for Enhancing Robustness of Deep Neural Networks against Adversarial Malware Samples
Adversarial machine learning in the context of image processing and related
applications has received a large amount of attention. However, adversarial
machine learning, especially adversarial deep learning, in the context of
malware detection has received much less attention despite its apparent
importance. In this paper, we present a framework for enhancing the robustness
of Deep Neural Networks (DNNs) against adversarial malware samples, dubbed
Hashing Transformation Deep Neural Networks} (HashTran-DNN). The core idea is
to use hash functions with a certain locality-preserving property to transform
samples to enhance the robustness of DNNs in malware classification. The
framework further uses a Denoising Auto-Encoder (DAE) regularizer to
reconstruct the hash representations of samples, making the resulting DNN
classifiers capable of attaining the locality information in the latent space.
We experiment with two concrete instantiations of the HashTran-DNN framework to
classify Android malware. Experimental results show that four known attacks can
render standard DNNs useless in classifying Android malware, that known
defenses can at most defend three of the four attacks, and that HashTran-DNN
can effectively defend against all of the four attacks.Comment: 13 pages (included references), 5 figure
DeepFense: Online Accelerated Defense Against Adversarial Deep Learning
Recent advances in adversarial Deep Learning (DL) have opened up a largely
unexplored surface for malicious attacks jeopardizing the integrity of
autonomous DL systems. With the wide-spread usage of DL in critical and
time-sensitive applications, including unmanned vehicles, drones, and video
surveillance systems, online detection of malicious inputs is of utmost
importance. We propose DeepFense, the first end-to-end automated framework that
simultaneously enables efficient and safe execution of DL models. DeepFense
formalizes the goal of thwarting adversarial attacks as an optimization problem
that minimizes the rarely observed regions in the latent feature space spanned
by a DL network. To solve the aforementioned minimization problem, a set of
complementary but disjoint modular redundancies are trained to validate the
legitimacy of the input samples in parallel with the victim DL model. DeepFense
leverages hardware/software/algorithm co-design and customized acceleration to
achieve just-in-time performance in resource-constrained settings. The proposed
countermeasure is unsupervised, meaning that no adversarial sample is leveraged
to train modular redundancies. We further provide an accompanying API to reduce
the non-recurring engineering cost and ensure automated adaptation to various
platforms. Extensive evaluations on FPGAs and GPUs demonstrate up to two orders
of magnitude performance improvement while enabling online adversarial sample
detection.Comment: Adding hardware acceleration for real-time execution of defender
module
Towards an Understanding of Neural Networks in Natural-Image Spaces
Two major uncertainties, dataset bias and adversarial examples, prevail in
state-of-the-art AI algorithms with deep neural networks. In this paper, we
present an intuitive explanation for these issues as well as an interpretation
of the performance of deep networks in a natural-image space. The explanation
consists of two parts: the philosophy of neural networks and a hypothetical
model of natural-image spaces. Following the explanation, we 1) demonstrate
that the values of training samples differ, 2) provide incremental boost to the
accuracy of a CIFAR-10 classifier by introducing an additional "random-noise"
category during training, 3) alleviate over-fitting thereby enhancing the
robustness against adversarial examples by detecting and excluding illusive
training samples that are consistently misclassified. Our overall contribution
is therefore twofold. First, while most existing algorithms treat data equally
and have a strong appetite for more data, we demonstrate in contrast that an
individual datum can sometimes have disproportionate and counterproductive
influence and that it is not always better to train neural networks with more
data. Next, we consider more thoughtful strategies by taking into account the
geometric and topological properties of natural-image spaces to which deep
networks are applied
- …