482 research outputs found
Detecting Adversarial Image Examples in Deep Networks with Adaptive Noise Reduction
Recently, many studies have demonstrated deep neural network (DNN)
classifiers can be fooled by the adversarial example, which is crafted via
introducing some perturbations into an original sample. Accordingly, some
powerful defense techniques were proposed. However, existing defense techniques
often require modifying the target model or depend on the prior knowledge of
attacks. In this paper, we propose a straightforward method for detecting
adversarial image examples, which can be directly deployed into unmodified
off-the-shelf DNN models. We consider the perturbation to images as a kind of
noise and introduce two classic image processing techniques, scalar
quantization and smoothing spatial filter, to reduce its effect. The image
entropy is employed as a metric to implement an adaptive noise reduction for
different kinds of images. Consequently, the adversarial example can be
effectively detected by comparing the classification results of a given sample
and its denoised version, without referring to any prior knowledge of attacks.
More than 20,000 adversarial examples against some state-of-the-art DNN models
are used to evaluate the proposed method, which are crafted with different
attack techniques. The experiments show that our detection method can achieve a
high overall F1 score of 96.39% and certainly raises the bar for defense-aware
attacks.Comment: 14 pages,
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=8482346&isnumber=435869
Combatting Adversarial Attacks through Denoising and Dimensionality Reduction: A Cascaded Autoencoder Approach
Machine Learning models are vulnerable to adversarial attacks that rely on
perturbing the input data. This work proposes a novel strategy using
Autoencoder Deep Neural Networks to defend a machine learning model against two
gradient-based attacks: The Fast Gradient Sign attack and Fast Gradient attack.
First we use an autoencoder to denoise the test data, which is trained with
both clean and corrupted data. Then, we reduce the dimension of the denoised
data using the hidden layer representation of another autoencoder. We perform
this experiment for multiple values of the bound of adversarial perturbations,
and consider different numbers of reduced dimensions. When the test data is
preprocessed using this cascaded pipeline, the tested deep neural network
classifier yields a much higher accuracy, thus mitigating the effect of the
adversarial perturbation.Comment: 7 pages, 8 figures, submitted to Conference on Information Sciences
and Systems (CISS 2019
Evaluating and Improving Adversarial Robustness of Machine Learning-Based Network Intrusion Detectors
Machine learning (ML), especially deep learning (DL) techniques have been
increasingly used in anomaly-based network intrusion detection systems (NIDS).
However, ML/DL has shown to be extremely vulnerable to adversarial attacks,
especially in such security-sensitive systems. Many adversarial attacks have
been proposed to evaluate the robustness of ML-based NIDSs. Unfortunately,
existing attacks mostly focused on feature-space and/or white-box attacks,
which make impractical assumptions in real-world scenarios, leaving the study
on practical gray/black-box attacks largely unexplored.
To bridge this gap, we conduct the first systematic study of the
gray/black-box traffic-space adversarial attacks to evaluate the robustness of
ML-based NIDSs. Our work outperforms previous ones in the following aspects:
(i) practical-the proposed attack can automatically mutate original traffic
with extremely limited knowledge and affordable overhead while preserving its
functionality; (ii) generic-the proposed attack is effective for evaluating the
robustness of various NIDSs using diverse ML/DL models and non-payload-based
features; (iii) explainable-we propose an explanation method for the fragile
robustness of ML-based NIDSs. Based on this, we also propose a defense scheme
against adversarial attacks to improve system robustness. We extensively
evaluate the robustness of various NIDSs using diverse feature sets and ML/DL
models. Experimental results show our attack is effective (e.g., >97% evasion
rate in half cases for Kitsune, a state-of-the-art NIDS) with affordable
execution cost and the proposed defense method can effectively mitigate such
attacks (evasion rate is reduced by >50% in most cases).Comment: This article has been accepted for publication by IEEE JSA
Defense Methods Against Adversarial Examples for Recurrent Neural Networks
Adversarial examples are known to mislead deep learning models to incorrectly
classify them, even in domains where such models achieve state-of-the-art
performance. Until recently, research on both attack and defense methods
focused on image recognition, primarily using convolutional neural networks
(CNNs). In recent years, adversarial example generation methods for recurrent
neural networks (RNNs) have been published, demonstrating that RNN classifiers
are also vulnerable to such attacks. In this paper, we present a novel defense
method, termed sequence squeezing, to make RNN classifiers more robust against
such attacks. Our method differs from previous defense methods which were
designed only for non-sequence based models. We also implement four additional
RNN defense methods inspired by recently published CNN defense methods. We
evaluate our methods against state-of-the-art attacks in the cyber security
domain where real adversaries (malware developers) exist, but our methods can
be applied against other discrete sequence based adversarial attacks, e.g., in
the NLP domain. Using our methods we were able to decrease the effectiveness of
such attack from 99.9% to 15%.Comment: Submitted as a conference paper to Euro S&P 202
Adequacy of the Gradient-Descent Method for Classifier Evasion Attacks
Despite the wide use of machine learning in adversarial settings including
computer security, recent studies have demonstrated vulnerabilities to evasion
attacks---carefully crafted adversarial samples that closely resemble
legitimate instances, but cause misclassification. In this paper, we examine
the adequacy of the leading approach to generating adversarial samples---the
gradient descent approach. In particular (1) we perform extensive experiments
on three datasets, MNIST, USPS and Spambase, in order to analyse the
effectiveness of the gradient-descent method against non-linear support vector
machines, and conclude that carefully reduced kernel smoothness can
significantly increase robustness to the attack; (2) we demonstrate that
separated inter-class support vectors lead to more secure models, and propose a
quantity similar to margin that can efficiently predict potential
susceptibility to gradient-descent attacks, before the attack is launched; and
(3) we design a new adversarial sample construction algorithm based on
optimising the multiplicative ratio of class decision functions.Comment: 10 pages, 7 figures, 10 table
Malware Evasion Attack and Defense
Machine learning (ML) classifiers are vulnerable to adversarial examples. An
adversarial example is an input sample which is slightly modified to induce
misclassification in an ML classifier. In this work, we investigate white-box
and grey-box evasion attacks to an ML-based malware detector and conduct
performance evaluations in a real-world setting. We compare the defense
approaches in mitigating the attacks. We propose a framework for deploying
grey-box and black-box attacks to malware detection systems.Comment: Accepted by IEEE DSN-DSML201
A Robust Approach for Securing Audio Classification Against Adversarial Attacks
Adversarial audio attacks can be considered as a small perturbation
unperceptive to human ears that is intentionally added to the audio signal and
causes a machine learning model to make mistakes. This poses a security concern
about the safety of machine learning models since the adversarial attacks can
fool such models toward the wrong predictions. In this paper we first review
some strong adversarial attacks that may affect both audio signals and their 2D
representations and evaluate the resiliency of the most common machine learning
model, namely deep learning models and support vector machines (SVM) trained on
2D audio representations such as short time Fourier transform (STFT), discrete
wavelet transform (DWT) and cross recurrent plot (CRP) against several
state-of-the-art adversarial attacks. Next, we propose a novel approach based
on pre-processed DWT representation of audio signals and SVM to secure audio
systems against adversarial attacks. The proposed architecture has several
preprocessing modules for generating and enhancing spectrograms including
dimension reduction and smoothing. We extract features from small patches of
the spectrograms using speeded up robust feature (SURF) algorithm which are
further used to generate a codebook using the K-Means++ algorithm. Finally,
codewords are used to train a SVM on the codebook of the SURF-generated
vectors. All these steps yield to a novel approach for audio classification
that provides a good trade-off between accuracy and resilience. Experimental
results on three environmental sound datasets show the competitive performance
of proposed approach compared to the deep neural networks both in terms of
accuracy and robustness against strong adversarial attacks.Comment: Paper Accepted for Publication in IEEE Transactions on Information
Forensics and Securit
Adversarial Examples Are Not Easily Detected: Bypassing Ten Detection Methods
Neural networks are known to be vulnerable to adversarial examples: inputs
that are close to natural inputs but classified incorrectly. In order to better
understand the space of adversarial examples, we survey ten recent proposals
that are designed for detection and compare their efficacy. We show that all
can be defeated by constructing new loss functions. We conclude that
adversarial examples are significantly harder to detect than previously
appreciated, and the properties believed to be intrinsic to adversarial
examples are in fact not. Finally, we propose several simple guidelines for
evaluating future proposed defenses
Detecting Adversarial Perturbations Through Spatial Behavior in Activation Spaces
Neural network based classifiers are still prone to manipulation through
adversarial perturbations. State of the art attacks can overcome most of the
defense or detection mechanisms suggested so far, and adversaries have the
upper hand in this arms race. Adversarial examples are designed to resemble the
normal input from which they were constructed, while triggering an incorrect
classification. This basic design goal leads to a characteristic spatial
behavior within the context of Activation Spaces, a term coined by the authors
to refer to the hyperspaces formed by the activation values of the network's
layers. Within the output of the first layers of the network, an adversarial
example is likely to resemble normal instances of the source class, while in
the final layers such examples will diverge towards the adversary's target
class. The steps below enable us to leverage this inherent shift from one class
to another in order to form a novel adversarial example detector. We construct
Euclidian spaces out of the activation values of each of the deep neural
network layers. Then, we induce a set of k-nearest neighbor classifiers (k-NN),
one per activation space of each neural network layer, using the
non-adversarial examples. We leverage those classifiers to produce a sequence
of class labels for each nonperturbed input sample and estimate the a priori
probability for a class label change between one activation space and another.
During the detection phase we compute a sequence of classification labels for
each input using the trained classifiers. We then estimate the likelihood of
those classification sequences and show that adversarial sequences are far less
likely than normal ones. We evaluated our detection method against the state of
the art C&W attack method, using two image classification datasets (MNIST,
CIFAR-10) reaching an AUC 0f 0.95 for the CIFAR-10 dataset
Adversarial Security Attacks and Perturbations on Machine Learning and Deep Learning Methods
The ever-growing big data and emerging artificial intelligence (AI) demand
the use of machine learning (ML) and deep learning (DL) methods. Cybersecurity
also benefits from ML and DL methods for various types of applications. These
methods however are susceptible to security attacks. The adversaries can
exploit the training and testing data of the learning models or can explore the
workings of those models for launching advanced future attacks. The topic of
adversarial security attacks and perturbations within the ML and DL domains is
a recent exploration and a great interest is expressed by the security
researchers and practitioners. The literature covers different adversarial
security attacks and perturbations on ML and DL methods and those have their
own presentation styles and merits. A need to review and consolidate knowledge
that is comprehending of this increasingly focused and growing topic of
research; however, is the current demand of the research communities. In this
review paper, we specifically aim to target new researchers in the
cybersecurity domain who may seek to acquire some basic knowledge on the
machine learning and deep learning models and algorithms, as well as some of
the relevant adversarial security attacks and perturbations
- …