199 research outputs found
Unifying Gradients to Improve Real-world Robustness for Deep Networks
The wide application of deep neural networks (DNNs) demands an increasing
amount of attention to their real-world robustness, i.e., whether a DNN resists
black-box adversarial attacks, among which score-based query attacks (SQAs) are
most threatening since they can effectively hurt a victim network with the only
access to model outputs. Defending against SQAs requires a slight but artful
variation of outputs due to the service purpose for users, who share the same
output information with SQAs. In this paper, we propose a real-world defense by
Unifying Gradients (UniG) of different data so that SQAs could only probe a
much weaker attack direction that is similar for different samples. Since such
universal attack perturbations have been validated as less aggressive than the
input-specific perturbations, UniG protects real-world DNNs by indicating
attackers a twisted and less informative attack direction. We implement UniG
efficiently by a Hadamard product module which is plug-and-play. According to
extensive experiments on 5 SQAs, 2 adaptive attacks and 7 defense baselines,
UniG significantly improves real-world robustness without hurting clean
accuracy on CIFAR10 and ImageNet. For instance, UniG maintains a model of
77.80% accuracy under 2500-query Square attack while the state-of-the-art
adversarially-trained model only has 67.34% on CIFAR10. Simultaneously, UniG
outperforms all compared baselines in terms of clean accuracy and achieves the
smallest modification of the model output. The code is released at
https://github.com/snowien/UniG-pytorch
A Survey on Transferability of Adversarial Examples across Deep Neural Networks
The emergence of Deep Neural Networks (DNNs) has revolutionized various
domains, enabling the resolution of complex tasks spanning image recognition,
natural language processing, and scientific problem-solving. However, this
progress has also exposed a concerning vulnerability: adversarial examples.
These crafted inputs, imperceptible to humans, can manipulate machine learning
models into making erroneous predictions, raising concerns for safety-critical
applications. An intriguing property of this phenomenon is the transferability
of adversarial examples, where perturbations crafted for one model can deceive
another, often with a different architecture. This intriguing property enables
"black-box" attacks, circumventing the need for detailed knowledge of the
target model. This survey explores the landscape of the adversarial
transferability of adversarial examples. We categorize existing methodologies
to enhance adversarial transferability and discuss the fundamental principles
guiding each approach. While the predominant body of research primarily
concentrates on image classification, we also extend our discussion to
encompass other vision tasks and beyond. Challenges and future prospects are
discussed, highlighting the importance of fortifying DNNs against adversarial
vulnerabilities in an evolving landscape
Adversarial Robustness of Deep Code Comment Generation
Deep neural networks (DNNs) have shown remarkable performance in a variety of
domains such as computer vision, speech recognition, or natural language
processing. Recently they also have been applied to various software
engineering tasks, typically involving processing source code. DNNs are
well-known to be vulnerable to adversarial examples, i.e., fabricated inputs
that could lead to various misbehaviors of the DNN model while being perceived
as benign by humans. In this paper, we focus on the code comment generation
task in software engineering and study the robustness issue of the DNNs when
they are applied to this task. We propose ACCENT, an identifier substitution
approach to craft adversarial code snippets, which are syntactically correct
and semantically close to the original code snippet, but may mislead the DNNs
to produce completely irrelevant code comments. In order to improve the
robustness, ACCENT also incorporates a novel training method, which can be
applied to existing code comment generation models. We conduct comprehensive
experiments to evaluate our approach by attacking the mainstream
encoder-decoder architectures on two large-scale publicly available datasets.
The results show that ACCENT efficiently produces stable attacks with
functionality-preserving adversarial examples, and the generated examples have
better transferability compared with baselines. We also confirm, via
experiments, the effectiveness in improving model robustness with our training
method
- …