16,791 research outputs found
HashTran-DNN: A Framework for Enhancing Robustness of Deep Neural Networks against Adversarial Malware Samples
Adversarial machine learning in the context of image processing and related
applications has received a large amount of attention. However, adversarial
machine learning, especially adversarial deep learning, in the context of
malware detection has received much less attention despite its apparent
importance. In this paper, we present a framework for enhancing the robustness
of Deep Neural Networks (DNNs) against adversarial malware samples, dubbed
Hashing Transformation Deep Neural Networks} (HashTran-DNN). The core idea is
to use hash functions with a certain locality-preserving property to transform
samples to enhance the robustness of DNNs in malware classification. The
framework further uses a Denoising Auto-Encoder (DAE) regularizer to
reconstruct the hash representations of samples, making the resulting DNN
classifiers capable of attaining the locality information in the latent space.
We experiment with two concrete instantiations of the HashTran-DNN framework to
classify Android malware. Experimental results show that four known attacks can
render standard DNNs useless in classifying Android malware, that known
defenses can at most defend three of the four attacks, and that HashTran-DNN
can effectively defend against all of the four attacks.Comment: 13 pages (included references), 5 figure
Addressing Model Vulnerability to Distributional Shifts over Image Transformation Sets
We are concerned with the vulnerability of computer vision models to
distributional shifts. We formulate a combinatorial optimization problem that
allows evaluating the regions in the image space where a given model is more
vulnerable, in terms of image transformations applied to the input, and face it
with standard search algorithms. We further embed this idea in a training
procedure, where we define new data augmentation rules according to the image
transformations that the current model is most vulnerable to, over iterations.
An empirical evaluation on classification and semantic segmentation problems
suggests that the devised algorithm allows to train models that are more robust
against content-preserving image manipulations and, in general, against
distributional shifts.Comment: ICCV 2019 (camera ready
Adversarial Examples - A Complete Characterisation of the Phenomenon
We provide a complete characterisation of the phenomenon of adversarial
examples - inputs intentionally crafted to fool machine learning models. We aim
to cover all the important concerns in this field of study: (1) the conjectures
on the existence of adversarial examples, (2) the security, safety and
robustness implications, (3) the methods used to generate and (4) protect
against adversarial examples and (5) the ability of adversarial examples to
transfer between different machine learning models. We provide ample background
information in an effort to make this document self-contained. Therefore, this
document can be used as survey, tutorial or as a catalog of attacks and
defences using adversarial examples
Motivating the Rules of the Game for Adversarial Example Research
Advances in machine learning have led to broad deployment of systems with
impressive performance on important problems. Nonetheless, these systems can be
induced to make errors on data that are surprisingly similar to examples the
learned system handles correctly. The existence of these errors raises a
variety of questions about out-of-sample generalization and whether bad actors
might use such examples to abuse deployed systems. As a result of these
security concerns, there has been a flurry of recent papers proposing
algorithms to defend against such malicious perturbations of correctly handled
examples. It is unclear how such misclassifications represent a different kind
of security problem than other errors, or even other attacker-produced examples
that have no specific relationship to an uncorrupted input. In this paper, we
argue that adversarial example defense papers have, to date, mostly considered
abstract, toy games that do not relate to any specific security concern.
Furthermore, defense papers have not yet precisely described all the abilities
and limitations of attackers that would be relevant in practical security.
Towards this end, we establish a taxonomy of motivations, constraints, and
abilities for more plausible adversaries. Finally, we provide a series of
recommendations outlining a path forward for future work to more clearly
articulate the threat model and perform more meaningful evaluation
Combating Adversarial Attacks Using Sparse Representations
It is by now well-known that small adversarial perturbations can induce
classification errors in deep neural networks (DNNs). In this paper, we make
the case that sparse representations of the input data are a crucial tool for
combating such attacks. For linear classifiers, we show that a sparsifying
front end is provably effective against -bounded attacks,
reducing output distortion due to the attack by a factor of roughly
where is the data dimension and is the sparsity level. We then extend
this concept to DNNs, showing that a "locally linear" model can be used to
develop a theoretical foundation for crafting attacks and defenses.
Experimental results for the MNIST dataset show the efficacy of the proposed
sparsifying front end.Comment: Accepted at ICLR Workshop 201
Benchmarking Neural Network Robustness to Common Corruptions and Perturbations
In this paper we establish rigorous benchmarks for image classifier
robustness. Our first benchmark, ImageNet-C, standardizes and expands the
corruption robustness topic, while showing which classifiers are preferable in
safety-critical applications. Then we propose a new dataset called ImageNet-P
which enables researchers to benchmark a classifier's robustness to common
perturbations. Unlike recent robustness research, this benchmark evaluates
performance on common corruptions and perturbations not worst-case adversarial
perturbations. We find that there are negligible changes in relative corruption
robustness from AlexNet classifiers to ResNet classifiers. Afterward we
discover ways to enhance corruption and perturbation robustness. We even find
that a bypassed adversarial defense provides substantial common perturbation
robustness. Together our benchmarks may aid future work toward networks that
robustly generalize.Comment: ICLR 2019 camera-ready; datasets available at
https://github.com/hendrycks/robustness ; this article supersedes
arXiv:1807.0169
Learning to Compose Domain-Specific Transformations for Data Augmentation
Data augmentation is a ubiquitous technique for increasing the size of
labeled training sets by leveraging task-specific data transformations that
preserve class labels. While it is often easy for domain experts to specify
individual transformations, constructing and tuning the more sophisticated
compositions typically needed to achieve state-of-the-art results is a
time-consuming manual task in practice. We propose a method for automating this
process by learning a generative sequence model over user-specified
transformation functions using a generative adversarial approach. Our method
can make use of arbitrary, non-deterministic transformation functions, is
robust to misspecified user input, and is trained on unlabeled data. The
learned transformation model can then be used to perform data augmentation for
any end discriminative model. In our experiments, we show the efficacy of our
approach on both image and text datasets, achieving improvements of 4.0
accuracy points on CIFAR-10, 1.4 F1 points on the ACE relation extraction task,
and 3.4 accuracy points when using domain-specific transformation operations on
a medical imaging dataset as compared to standard heuristic augmentation
approaches.Comment: To appear at Neural Information Processing Systems (NIPS) 201
Robust Subspace Recovery Layer for Unsupervised Anomaly Detection
We propose a neural network for unsupervised anomaly detection with a novel
robust subspace recovery layer (RSR layer). This layer seeks to extract the
underlying subspace from a latent representation of the given data and removes
outliers that lie away from this subspace. It is used within an autoencoder.
The encoder maps the data into a latent space, from which the RSR layer
extracts the subspace. The decoder then smoothly maps back the underlying
subspace to a "manifold" close to the original inliers. Inliers and outliers
are distinguished according to the distances between the original and mapped
positions (small for inliers and large for outliers). Extensive numerical
experiments with both image and document datasets demonstrate state-of-the-art
precision and recall.Comment: This work is on the ICLR 2020 conferenc
Task-generalizable Adversarial Attack based on Perceptual Metric
Deep neural networks (DNNs) can be easily fooled by adding human
imperceptible perturbations to the images. These perturbed images are known as
`adversarial examples' and pose a serious threat to security and safety
critical systems. A litmus test for the strength of adversarial examples is
their transferability across different DNN models in a black box setting (i.e.
when the target model's architecture and parameters are not known to attacker).
Current attack algorithms that seek to enhance adversarial transferability work
on the decision level i.e. generate perturbations that alter the network
decisions. This leads to two key limitations: (a) An attack is dependent on the
task-specific loss function (e.g. softmax cross-entropy for object recognition)
and therefore does not generalize beyond its original task. (b) The adversarial
examples are specific to the network architecture and demonstrate poor
transferability to other network architectures. We propose a novel approach to
create adversarial examples that can broadly fool different networks on
multiple tasks. Our approach is based on the following intuition: "Perpetual
metrics based on neural network features are highly generalizable and show
excellent performance in measuring and stabilizing input distortions. Therefore
an ideal attack that creates maximum distortions in the network feature space
should realize highly transferable examples". We report extensive experiments
to show how adversarial examples generalize across multiple networks for
classification, object detection and segmentation tasks
Towards a Robust Deep Neural Network in Texts: A Survey
Deep neural networks (DNNs) have achieved remarkable success in various tasks
(e.g., image classification, speech recognition, and natural language
processing). However, researches have shown that DNN models are vulnerable to
adversarial examples, which cause incorrect predictions by adding imperceptible
perturbations into normal inputs. Studies on adversarial examples in image
domain have been well investigated, but in texts the research is not enough,
let alone a comprehensive survey in this field. In this paper, we aim at
presenting a comprehensive understanding of adversarial attacks and
corresponding mitigation strategies in texts. Specifically, we first give a
taxonomy of adversarial attacks and defenses in texts from the perspective of
different natural language processing (NLP) tasks, and then introduce how to
build a robust DNN model via testing and verification. Finally, we discuss the
existing challenges of adversarial attacks and defenses in texts and present
the future research directions in this emerging field
- …