3,871 research outputs found
Improving Transferability of Adversarial Examples with Input Diversity
Though CNNs have achieved the state-of-the-art performance on various vision
tasks, they are vulnerable to adversarial examples --- crafted by adding
human-imperceptible perturbations to clean images. However, most of the
existing adversarial attacks only achieve relatively low success rates under
the challenging black-box setting, where the attackers have no knowledge of the
model structure and parameters. To this end, we propose to improve the
transferability of adversarial examples by creating diverse input patterns.
Instead of only using the original images to generate adversarial examples, our
method applies random transformations to the input images at each iteration.
Extensive experiments on ImageNet show that the proposed attack method can
generate adversarial examples that transfer much better to different networks
than existing baselines. By evaluating our method against top defense solutions
and official baselines from NIPS 2017 adversarial competition, the enhanced
attack reaches an average success rate of 73.0%, which outperforms the top-1
attack submission in the NIPS competition by a large margin of 6.6%. We hope
that our proposed attack strategy can serve as a strong benchmark baseline for
evaluating the robustness of networks to adversaries and the effectiveness of
different defense methods in the future. Code is available at
https://github.com/cihangxie/DI-2-FGSM.Comment: CVPR 2019, code is available at:
https://github.com/cihangxie/DI-2-FGS
Improving Adversarial Robustness via Promoting Ensemble Diversity
Though deep neural networks have achieved significant progress on various
tasks, often enhanced by model ensemble, existing high-performance models can
be vulnerable to adversarial attacks. Many efforts have been devoted to
enhancing the robustness of individual networks and then constructing a
straightforward ensemble, e.g., by directly averaging the outputs, which
ignores the interaction among networks. This paper presents a new method that
explores the interaction among individual networks to improve robustness for
ensemble models. Technically, we define a new notion of ensemble diversity in
the adversarial setting as the diversity among non-maximal predictions of
individual members, and present an adaptive diversity promoting (ADP)
regularizer to encourage the diversity, which leads to globally better
robustness for the ensemble by making adversarial examples difficult to
transfer among individual members. Our method is computationally efficient and
compatible with the defense methods acting on individual networks. Empirical
results on various datasets verify that our method can improve adversarial
robustness while maintaining state-of-the-art accuracy on normal examples.Comment: ICML 201
Curls & Whey: Boosting Black-Box Adversarial Attacks
Image classifiers based on deep neural networks suffer from harassment caused
by adversarial examples. Two defects exist in black-box iterative attacks that
generate adversarial examples by incrementally adjusting the noise-adding
direction for each step. On the one hand, existing iterative attacks add noises
monotonically along the direction of gradient ascent, resulting in a lack of
diversity and adaptability of the generated iterative trajectories. On the
other hand, it is trivial to perform adversarial attack by adding excessive
noises, but currently there is no refinement mechanism to squeeze redundant
noises. In this work, we propose Curls & Whey black-box attack to fix the above
two defects. During Curls iteration, by combining gradient ascent and descent,
we `curl' up iterative trajectories to integrate more diversity and
transferability into adversarial examples. Curls iteration also alleviates the
diminishing marginal effect in existing iterative attacks. The Whey
optimization further squeezes the `whey' of noises by exploiting the robustness
of adversarial perturbation. Extensive experiments on Imagenet and
Tiny-Imagenet demonstrate that our approach achieves impressive decrease on
noise magnitude in l2 norm. Curls & Whey attack also shows promising
transferability against ensemble models as well as adversarially trained
models. In addition, we extend our attack to the targeted misclassification,
effectively reducing the difficulty of targeted attacks under black-box
condition.Comment: CVPR 2019 Ora
Enhancing Cross-task Transferability of Adversarial Examples with Dispersion Reduction
Neural networks are known to be vulnerable to carefully crafted adversarial
examples, and these malicious samples often transfer, i.e., they maintain their
effectiveness even against other models. With great efforts delved into the
transferability of adversarial examples, surprisingly, less attention has been
paid to its impact on real-world deep learning deployment. In this paper, we
investigate the transferability of adversarial examples across a wide range of
real-world computer vision tasks, including image classification, explicit
content detection, optical character recognition (OCR), and object detection.
It represents the cybercriminal's situation where an ensemble of different
detection mechanisms need to be evaded all at once. We propose practical attack
that overcomes existing attacks' limitation of requiring task-specific loss
functions by targeting on the `dispersion' of internal feature map. We report
evaluation on four different computer vision tasks provided by Google Cloud
Vision APIs to show how our approach outperforms existing attacks by degrading
performance of multiple CV tasks by a large margin with only modest
perturbations
CAAD 2018: Generating Transferable Adversarial Examples
Deep neural networks (DNNs) are vulnerable to adversarial examples,
perturbations carefully crafted to fool the targeted DNN, in both the
non-targeted and targeted case. In the non-targeted case, the attacker simply
aims to induce misclassification. In the targeted case, the attacker aims to
induce classification to a specified target class. In addition, it has been
observed that strong adversarial examples can transfer to unknown models,
yielding a serious security concern. The NIPS 2017 competition was organized to
accelerate research in adversarial attacks and defenses, taking place in the
realistic setting where submitted adversarial attacks attempt to transfer to
submitted defenses. The CAAD 2018 competition took place with nearly identical
rules to the NIPS 2017 one. Given the requirement that the NIPS 2017
submissions were to be open-sourced, participants in the CAAD 2018 competition
were able to directly build upon previous solutions, and thus improve the
state-of-the-art in this setting. Our team participated in the CAAD 2018
competition, and won 1st place in both attack subtracks, non-targeted and
targeted adversarial attacks, and 3rd place in defense. We outline our
solutions and development results in this article. We hope our results can
inform researchers in both generating and defending against adversarial
examples.Comment: 1st place attack solutions and 3rd place defense in CAAD 2018
Competitio
A Robust Approach for Securing Audio Classification Against Adversarial Attacks
Adversarial audio attacks can be considered as a small perturbation
unperceptive to human ears that is intentionally added to the audio signal and
causes a machine learning model to make mistakes. This poses a security concern
about the safety of machine learning models since the adversarial attacks can
fool such models toward the wrong predictions. In this paper we first review
some strong adversarial attacks that may affect both audio signals and their 2D
representations and evaluate the resiliency of the most common machine learning
model, namely deep learning models and support vector machines (SVM) trained on
2D audio representations such as short time Fourier transform (STFT), discrete
wavelet transform (DWT) and cross recurrent plot (CRP) against several
state-of-the-art adversarial attacks. Next, we propose a novel approach based
on pre-processed DWT representation of audio signals and SVM to secure audio
systems against adversarial attacks. The proposed architecture has several
preprocessing modules for generating and enhancing spectrograms including
dimension reduction and smoothing. We extract features from small patches of
the spectrograms using speeded up robust feature (SURF) algorithm which are
further used to generate a codebook using the K-Means++ algorithm. Finally,
codewords are used to train a SVM on the codebook of the SURF-generated
vectors. All these steps yield to a novel approach for audio classification
that provides a good trade-off between accuracy and resilience. Experimental
results on three environmental sound datasets show the competitive performance
of proposed approach compared to the deep neural networks both in terms of
accuracy and robustness against strong adversarial attacks.Comment: Paper Accepted for Publication in IEEE Transactions on Information
Forensics and Securit
MULDEF: Multi-model-based Defense Against Adversarial Examples for Neural Networks
Despite being popularly used in many applications, neural network models have
been found to be vulnerable to adversarial examples, i.e., carefully crafted
examples aiming to mislead machine learning models. Adversarial examples can
pose potential risks on safety and security critical applications. However,
existing defense approaches are still vulnerable to attacks, especially in a
white-box attack scenario. To address this issue, we propose a new defense
approach, named MulDef, based on robustness diversity. Our approach consists of
(1) a general defense framework based on multiple models and (2) a technique
for generating these multiple models to achieve high defense capability. In
particular, given a target model, our framework includes multiple models
(constructed from the target model) to form a model family. The model family is
designed to achieve robustness diversity (i.e., an adversarial example
successfully attacking one model cannot succeed in attacking other models in
the family). At runtime, a model is randomly selected from the family to be
applied on each input example. Our general framework can inspire rich future
research to construct a desirable model family achieving higher robustness
diversity. Our evaluation results show that MulDef (with only up to 5 models in
the family) can substantially improve the target model's accuracy on
adversarial examples by 22-74% in a white-box attack scenario, while
maintaining similar accuracy on legitimate examples
Task-generalizable Adversarial Attack based on Perceptual Metric
Deep neural networks (DNNs) can be easily fooled by adding human
imperceptible perturbations to the images. These perturbed images are known as
`adversarial examples' and pose a serious threat to security and safety
critical systems. A litmus test for the strength of adversarial examples is
their transferability across different DNN models in a black box setting (i.e.
when the target model's architecture and parameters are not known to attacker).
Current attack algorithms that seek to enhance adversarial transferability work
on the decision level i.e. generate perturbations that alter the network
decisions. This leads to two key limitations: (a) An attack is dependent on the
task-specific loss function (e.g. softmax cross-entropy for object recognition)
and therefore does not generalize beyond its original task. (b) The adversarial
examples are specific to the network architecture and demonstrate poor
transferability to other network architectures. We propose a novel approach to
create adversarial examples that can broadly fool different networks on
multiple tasks. Our approach is based on the following intuition: "Perpetual
metrics based on neural network features are highly generalizable and show
excellent performance in measuring and stabilizing input distortions. Therefore
an ideal attack that creates maximum distortions in the network feature space
should realize highly transferable examples". We report extensive experiments
to show how adversarial examples generalize across multiple networks for
classification, object detection and segmentation tasks
Adversarial Examples: Opportunities and Challenges
Deep neural networks (DNNs) have shown huge superiority over humans in image
recognition, speech processing, autonomous vehicles and medical diagnosis.
However, recent studies indicate that DNNs are vulnerable to adversarial
examples (AEs), which are designed by attackers to fool deep learning models.
Different from real examples, AEs can mislead the model to predict incorrect
outputs while hardly be distinguished by human eyes, therefore threaten
security-critical deep-learning applications. In recent years, the generation
and defense of AEs have become a research hotspot in the field of artificial
intelligence (AI) security. This article reviews the latest research progress
of AEs. First, we introduce the concept, cause, characteristics and evaluation
metrics of AEs, then give a survey on the state-of-the-art AE generation
methods with the discussion of advantages and disadvantages. After that, we
review the existing defenses and discuss their limitations. Finally, future
research opportunities and challenges on AEs are prospected.Comment: 16 pages, 13 figures, 5 table
Sitatapatra: Blocking the Transfer of Adversarial Samples
Convolutional Neural Networks (CNNs) are widely used to solve classification
tasks in computer vision. However, they can be tricked into misclassifying
specially crafted `adversarial' samples -- and samples built to trick one model
often work alarmingly well against other models trained on the same task. In
this paper we introduce Sitatapatra, a system designed to block the transfer of
adversarial samples. It diversifies neural networks using a key, as in
cryptography, and provides a mechanism for detecting attacks. What's more, when
adversarial samples are detected they can typically be traced back to the
individual device that was used to develop them. The run-time overheads are
minimal permitting the use of Sitatapatra on constrained systems
- …