2,859 research outputs found
Defense Methods Against Adversarial Examples for Recurrent Neural Networks
Adversarial examples are known to mislead deep learning models to incorrectly
classify them, even in domains where such models achieve state-of-the-art
performance. Until recently, research on both attack and defense methods
focused on image recognition, primarily using convolutional neural networks
(CNNs). In recent years, adversarial example generation methods for recurrent
neural networks (RNNs) have been published, demonstrating that RNN classifiers
are also vulnerable to such attacks. In this paper, we present a novel defense
method, termed sequence squeezing, to make RNN classifiers more robust against
such attacks. Our method differs from previous defense methods which were
designed only for non-sequence based models. We also implement four additional
RNN defense methods inspired by recently published CNN defense methods. We
evaluate our methods against state-of-the-art attacks in the cyber security
domain where real adversaries (malware developers) exist, but our methods can
be applied against other discrete sequence based adversarial attacks, e.g., in
the NLP domain. Using our methods we were able to decrease the effectiveness of
such attack from 99.9% to 15%.Comment: Submitted as a conference paper to Euro S&P 202
Towards a Robust Deep Neural Network in Texts: A Survey
Deep neural networks (DNNs) have achieved remarkable success in various tasks
(e.g., image classification, speech recognition, and natural language
processing). However, researches have shown that DNN models are vulnerable to
adversarial examples, which cause incorrect predictions by adding imperceptible
perturbations into normal inputs. Studies on adversarial examples in image
domain have been well investigated, but in texts the research is not enough,
let alone a comprehensive survey in this field. In this paper, we aim at
presenting a comprehensive understanding of adversarial attacks and
corresponding mitigation strategies in texts. Specifically, we first give a
taxonomy of adversarial attacks and defenses in texts from the perspective of
different natural language processing (NLP) tasks, and then introduce how to
build a robust DNN model via testing and verification. Finally, we discuss the
existing challenges of adversarial attacks and defenses in texts and present
the future research directions in this emerging field
State-Reification Networks: Improving Generalization by Modeling the Distribution of Hidden Representations
Machine learning promises methods that generalize well from finite labeled
data. However, the brittleness of existing neural net approaches is revealed by
notable failures, such as the existence of adversarial examples that are
misclassified despite being nearly identical to a training example, or the
inability of recurrent sequence-processing nets to stay on track without
teacher forcing. We introduce a method, which we refer to as \emph{state
reification}, that involves modeling the distribution of hidden states over the
training data and then projecting hidden states observed during testing toward
this distribution. Our intuition is that if the network can remain in a
familiar manifold of hidden space, subsequent layers of the net should be well
trained to respond appropriately. We show that this state-reification method
helps neural nets to generalize better, especially when labeled data are
sparse, and also helps overcome the challenge of achieving robust
generalization with adversarial training.Comment: ICML 2019 [full oral]. arXiv admin note: text overlap with
arXiv:1805.0839
ROSA: Robust Salient Object Detection against Adversarial Attacks
Recently salient object detection has witnessed remarkable improvement owing
to the deep convolutional neural networks which can harvest powerful features
for images. In particular, state-of-the-art salient object detection methods
enjoy high accuracy and efficiency from fully convolutional network (FCN) based
frameworks which are trained from end to end and predict pixel-wise labels.
However, such framework suffers from adversarial attacks which confuse neural
networks via adding quasi-imperceptible noises to input images without changing
the ground truth annotated by human subjects. To our knowledge, this paper is
the first one that mounts successful adversarial attacks on salient object
detection models and verifies that adversarial samples are effective on a wide
range of existing methods. Furthermore, this paper proposes a novel end-to-end
trainable framework to enhance the robustness for arbitrary FCN-based salient
object detection models against adversarial attacks. The proposed framework
adopts a novel idea that first introduces some new generic noise to destroy
adversarial perturbations, and then learns to predict saliency maps for input
images with the introduced noise. Specifically, our proposed method consists of
a segment-wise shielding component, which preserves boundaries and destroys
delicate adversarial noise patterns and a context-aware restoration component,
which refines saliency maps through global contrast modeling. Experimental
results suggest that our proposed framework improves the performance
significantly for state-of-the-art models on a series of datasets.Comment: To be published in IEEE Transactions on Cybernetic
Security and Privacy Issues in Deep Learning
With the development of machine learning (ML), expectations for artificial
intelligence (AI) technology have been increasing daily. In particular, deep
neural networks have shown outstanding performance results in many fields. Many
applications are deeply involved in our daily life, such as making significant
decisions in application areas based on predictions or classifications, in
which a DL model could be relevant. Hence, if a DL model causes mispredictions
or misclassifications due to malicious external influences, then it can cause
very large difficulties in real life. Moreover, training DL models involve an
enormous amount of data and the training data often include sensitive
information. Therefore, DL models should not expose the privacy of such data.
In this paper, we review the vulnerabilities and the developed defense methods
on the security of the models and data privacy under the notion of secure and
private AI (SPAI). We also discuss current challenges and open issues
Design of intentional backdoors in sequential models
Recent work has demonstrated robust mechanisms by which attacks can be
orchestrated on machine learning models. In contrast to adversarial examples,
backdoor or trojan attacks embed surgically modified samples with targeted
labels in the model training process to cause the targeted model to learn to
misclassify chosen samples in the presence of specific triggers, while keeping
the model performance stable across other nominal samples. However, current
published research on trojan attacks mainly focuses on classification problems,
which ignores sequential dependency between inputs. In this paper, we propose
methods to discreetly introduce and exploit novel backdoor attacks within a
sequential decision-making agent, such as a reinforcement learning agent, by
training multiple benign and malicious policies within a single long short-term
memory (LSTM) network. We demonstrate the effectiveness as well as the damaging
impact of such attacks through initial outcomes generated from our approach,
employed on grid-world environments. We also provide evidence as well as
intuition on how the trojan trigger and malicious policy is activated.
Challenges with network size and unintentional triggers are identified and
analogies with adversarial examples are also discussed. In the end, we propose
potential approaches to defend against or serve as early detection for such
attacks. Results of our work can also be extended to many applications of LSTM
and recurrent networks
MagNet: a Two-Pronged Defense against Adversarial Examples
Deep learning has shown promising results on hard perceptual problems in
recent years. However, deep learning systems are found to be vulnerable to
small adversarial perturbations that are nearly imperceptible to human. Such
specially crafted perturbations cause deep learning systems to output incorrect
decisions, with potentially disastrous consequences. These vulnerabilities
hinder the deployment of deep learning systems where safety or security is
important. Attempts to secure deep learning systems either target specific
attacks or have been shown to be ineffective.
In this paper, we propose MagNet, a framework for defending neural network
classifiers against adversarial examples. MagNet does not modify the protected
classifier or know the process for generating adversarial examples. MagNet
includes one or more separate detector networks and a reformer network.
Different from previous work, MagNet learns to differentiate between normal and
adversarial examples by approximating the manifold of normal examples. Since it
does not rely on any process for generating adversarial examples, it has
substantial generalization power. Moreover, MagNet reconstructs adversarial
examples by moving them towards the manifold, which is effective for helping
classify adversarial examples with small perturbation correctly. We discuss the
intrinsic difficulty in defending against whitebox attack and propose a
mechanism to defend against graybox attack. Inspired by the use of randomness
in cryptography, we propose to use diversity to strengthen MagNet. We show
empirically that MagNet is effective against most advanced state-of-the-art
attacks in blackbox and graybox scenarios while keeping false positive rate on
normal examples very low.Comment: Accepted at the ACM Conference on Computer and Communications
Security (CCS), 201
Neural Networks in Adversarial Setting and Ill-Conditioned Weight Space
Recently, Neural networks have seen a huge surge in its adoption due to their
ability to provide high accuracy on various tasks. On the other hand, the
existence of adversarial examples have raised suspicions regarding the
generalization capabilities of neural networks. In this work, we focus on the
weight matrix learnt by the neural networks and hypothesize that ill
conditioned weight matrix is one of the contributing factors in neural
network's susceptibility towards adversarial examples. For ensuring that the
learnt weight matrix's condition number remains sufficiently low, we suggest
using orthogonal regularizer. We show that this indeed helps in increasing the
adversarial accuracy on MNIST and F-MNIST datasets
Is Machine Learning in Power Systems Vulnerable?
Recent advances in Machine Learning(ML) have led to its broad adoption in a
series of power system applications, ranging from meter data analytics,
renewable/load/price forecasting to grid security assessment. Although these
data-driven methods yield state-of-the-art performances in many tasks, the
robustness and security of applying such algorithms in modern power grids have
not been discussed. In this paper, we attempt to address the issues regarding
the security of ML applications in power systems. We first show that most of
the current ML algorithms proposed in power systems are vulnerable to
adversarial examples, which are maliciously crafted input data. We then adopt
and extend a simple yet efficient algorithm for finding subtle perturbations,
which could be used for generating adversaries for both categorical(e.g., user
load profile classification) and sequential applications(e.g., renewables
generation forecasting). Case studies on classification of power quality
disturbances and forecast of building loads demonstrate the vulnerabilities of
current ML algorithms in power networks under our adversarial designs. These
vulnerabilities call for design of robust and secure ML algorithms for real
world applications.Comment: Accepted to IEEE SmartGridComm201
Characterizing Audio Adversarial Examples Using Temporal Dependency
Recent studies have highlighted adversarial examples as a ubiquitous threat
to different neural network models and many downstream applications.
Nonetheless, as unique data properties have inspired distinct and powerful
learning principles, this paper aims to explore their potentials towards
mitigating adversarial inputs. In particular, our results reveal the importance
of using the temporal dependency in audio data to gain discriminate power
against adversarial examples. Tested on the automatic speech recognition (ASR)
tasks and three recent audio adversarial attacks, we find that (i) input
transformation developed from image adversarial defense provides limited
robustness improvement and is subtle to advanced attacks; (ii) temporal
dependency can be exploited to gain discriminative power against audio
adversarial examples and is resistant to adaptive attacks considered in our
experiments. Our results not only show promising means of improving the
robustness of ASR systems, but also offer novel insights in exploiting
domain-specific data properties to mitigate negative effects of adversarial
examples
- …