210,089 research outputs found
Adversarial Robustness of Deep Code Comment Generation
Deep neural networks (DNNs) have shown remarkable performance in a variety of
domains such as computer vision, speech recognition, or natural language
processing. Recently they also have been applied to various software
engineering tasks, typically involving processing source code. DNNs are
well-known to be vulnerable to adversarial examples, i.e., fabricated inputs
that could lead to various misbehaviors of the DNN model while being perceived
as benign by humans. In this paper, we focus on the code comment generation
task in software engineering and study the robustness issue of the DNNs when
they are applied to this task. We propose ACCENT, an identifier substitution
approach to craft adversarial code snippets, which are syntactically correct
and semantically close to the original code snippet, but may mislead the DNNs
to produce completely irrelevant code comments. In order to improve the
robustness, ACCENT also incorporates a novel training method, which can be
applied to existing code comment generation models. We conduct comprehensive
experiments to evaluate our approach by attacking the mainstream
encoder-decoder architectures on two large-scale publicly available datasets.
The results show that ACCENT efficiently produces stable attacks with
functionality-preserving adversarial examples, and the generated examples have
better transferability compared with baselines. We also confirm, via
experiments, the effectiveness in improving model robustness with our training
method
Training deep code comment generation models via data augmentation
With the development of deep neural networks (DNNs) and the publicly available source code repositories, deep code comment generation models have demonstrated reasonable performance on test datasets. However, it has been confirmed in computer vision (CV) and natural language processing (NLP) that DNNs are vulner- able to adversarial examples. In this paper, we investigate how to maintain the performance of the models against these perturbed samples. We propose a simple, but effective, method to improve the robustness by training the model via data augmentation. We conduct experiments to evaluate our approach on two mainstream sequence-sequence (seq2seq) architectures which are based on the LSTM and the Transformer with a large-scale publicly available dataset. The experimental results demonstrate that our method can efficiently improve the capability of different models to defend the perturbed samples
- …