17 research outputs found
DeepMutation: A Neural Mutation Tool
Mutation testing can be used to assess the fault-detection capabilities of a
given test suite. To this aim, two characteristics of mutation testing
frameworks are of paramount importance: (i) they should generate mutants that
are representative of real faults; and (ii) they should provide a complete tool
chain able to automatically generate, inject, and test the mutants. To address
the first point, we recently proposed an approach using a Recurrent Neural
Network Encoder-Decoder architecture to learn mutants from ~787k faults mined
from real programs. The empirical evaluation of this approach confirmed its
ability to generate mutants representative of real faults. In this paper, we
address the second point, presenting DeepMutation, a tool wrapping our deep
learning model into a fully automated tool chain able to generate, inject, and
test mutants learned from real faults. Video:
https://sites.google.com/view/learning-mutation/deepmutationComment: Accepted to the 42nd ACM/IEEE International Conference on Software
Engineering (ICSE 2020), Demonstrations Track - Seoul, South Korea, May
23-29, 2020, 4 page
MUFIN: Improving Neural Repair Models with Back-Translation
Automated program repair is the task of automatically repairing software
bugs. A promising direction in this field is self-supervised learning, a
learning paradigm in which repair models are trained without commits
representing pairs of bug/fix. In self-supervised neural program repair, those
bug/fix pairs are generated in some ways. The main problem is to generate
interesting and diverse pairs that maximize the effectiveness of training. As a
contribution to this problem, we propose to use back-translation, a technique
coming from neural machine translation. We devise and implement MUFIN, a
back-translation training technique for program repair, with specifically
designed code critics to select high-quality training samples. Our results show
that MUFIN's back-translation loop generates valuable training samples in a
fully automated, self-supervised manner, generating more than half-a-million
pairs of bug/fix. The code critic design is key because of a fundamental
trade-off between how restrictive a critic is and how many samples are
available for optimization during back-translation
A Differential Testing Approach for Evaluating Abstract Syntax Tree Mapping Algorithms
Abstract syntax tree (AST) mapping algorithms are widely used to analyze
changes in source code. Despite the foundational role of AST mapping
algorithms, little effort has been made to evaluate the accuracy of AST mapping
algorithms, i.e., the extent to which an algorihtm captures the evolution of
code. We observe that a program element often has only one best-mapped program
element. Based on this observation, we propose a hierarchical approach to
automatically compare the similarity of mapped statements and tokens by
different algorithms. By performing the comparison, we determine if each of the
compared algorithms generates inaccurate mappings for a statement or its
tokens. We invite 12 external experts to determine if three commonly used AST
mapping algorithms generate accurate mappings for a statement and its tokens
for 200 statements. Based on the experts' feedback,we observe that our approach
achieves a precision of 0.98--1.00 and a recall of 0.65--0.75. Furthermore, we
conduct a large-scale study with a dataset of ten Java projects, containing a
total of 263,165 file revisions. Our approach determines that GumTree, MTDiff
and IJM generate inaccurate mappings for 20%--29%, 25%--36% and 21%--30% of the
file revisions, respectively. Our experimental results show that state-of-art
AST mapping agorithms still need improvements
Studying the Usage of Text-To-Text Transfer Transformer to Support Code-Related Tasks
Deep learning (DL) techniques are gaining more and more attention in the
software engineering community. They have been used to support several
code-related tasks, such as automatic bug fixing and code comments generation.
Recent studies in the Natural Language Processing (NLP) field have shown that
the Text-To-Text Transfer Transformer (T5) architecture can achieve
state-of-the-art performance for a variety of NLP tasks. The basic idea behind
T5 is to first pre-train a model on a large and generic dataset using a
self-supervised task ( e.g: filling masked words in sentences). Once the model
is pre-trained, it is fine-tuned on smaller and specialized datasets, each one
related to a specific task ( e.g: language translation, sentence
classification). In this paper, we empirically investigate how the T5 model
performs when pre-trained and fine-tuned to support code-related tasks. We
pre-train a T5 model on a dataset composed of natural language English text and
source code. Then, we fine-tune such a model by reusing datasets used in four
previous works that used DL techniques to: (i) fix bugs, (ii) inject code
mutants, (iii) generate assert statements, and (iv) generate code comments. We
compared the performance of this single model with the results reported in the
four original papers proposing DL-based solutions for those four tasks. We show
that our T5 model, exploiting additional data for the self-supervised
pre-training phase, can achieve performance improvements over the four
baselines.Comment: Accepted to the 43rd International Conference on Software Engineering
(ICSE 2021
Large Language Models of Code Fail at Completing Code with Potential Bugs
Large language models of code (Code-LLMs) have recently brought tremendous
advances to code completion, a fundamental feature of programming assistance
and code intelligence. However, most existing works ignore the possible
presence of bugs in the code context for generation, which are inevitable in
software development. Therefore, we introduce and study the buggy-code
completion problem, inspired by the realistic scenario of real-time code
suggestion where the code context contains potential bugs -- anti-patterns that
can become bugs in the completed program. To systematically study the task, we
introduce two datasets: one with synthetic bugs derived from semantics-altering
operator changes (buggy-HumanEval) and one with realistic bugs derived from
user submissions to coding problems (buggy-FixEval). We find that the presence
of potential bugs significantly degrades the generation performance of the
high-performing Code-LLMs. For instance, the passing rates of CodeGen-2B-mono
on test cases of buggy-HumanEval drop more than 50% given a single potential
bug in the context. Finally, we investigate several post-hoc methods for
mitigating the adverse effect of potential bugs and find that there remains a
large gap in post-mitigation performance.Comment: 25 page
On Learning Meaningful Assert Statements for Unit Test Cases
Software testing is an essential part of the software lifecycle and requires
a substantial amount of time and effort. It has been estimated that software
developers spend close to 50% of their time on testing the code they write. For
these reasons, a long standing goal within the research community is to
(partially) automate software testing. While several techniques and tools have
been proposed to automatically generate test methods, recent work has
criticized the quality and usefulness of the assert statements they generate.
Therefore, we employ a Neural Machine Translation (NMT) based approach called
Atlas(AuTomatic Learning of Assert Statements) to automatically generate
meaningful assert statements for test methods. Given a test method and a focal
method (i.e.,the main method under test), Atlas can predict a meaningful assert
statement to assess the correctness of the focal method. We applied Atlas to
thousands of test methods from GitHub projects and it was able to predict the
exact assert statement manually written by developers in 31% of the cases when
only considering the top-1 predicted assert. When considering the top-5
predicted assert statements, Atlas is able to predict exact matches in 50% of
the cases. These promising results hint to the potential usefulness ofour
approach as (i) a complement to automatic test case generation techniques, and
(ii) a code completion support for developers, whocan benefit from the
recommended assert statements while writing test code
IntJect: Vulnerability Intent Bug Seeding
Studying and exposing software vulnerabilities is important to ensure software security, safety, and reliability. Software engineers often inject vulnerabilities into their programs to test the reliability of their test suites, vulnerability detectors, and security measures. However, state-of-the-art vulnerability injection methods only capture code syntax/patterns, they do not learn the intent of the vulnerability and are limited to the syntax of the original dataset. To address this challenge, we propose the first intent-based vulnerability injection method that learns both the program syntax and vulnerability intent. Our approach applies a combination of NLP methods and semantic-preserving program mutations (at the bytecode level) to inject code vulnerabilities. Given a dataset of known vulnerabilities (containing benign and vulnerable code pairs), our approach proceeds by employing semantic-preserving program mutations to transform the existing dataset to semantically similar code. Then, it learns the intent of the vulnerability via neural machine translation (Seq2Seq) models. The key insight is to employ Seq2Seq to learn the intent (context) of the vulnerable code in a manner that is agnostic of the specific program instance. We evaluate the performance of our approach using 1275 vulnerabilities belonging to five (5) CWEs from the Juliet test suite. We examine the effectiveness of our approach in producing compilable and vulnerable code. Our results show that INTJECT is effective, almost all (99%) of the code produced by our approach is vulnerable and compilable. We also demonstrate that the vulnerable programs generated by INTJECT are semantically similar to the withheld original vulnerable code. Finally, we show that our mutation-based data transformation approach outperforms its alternatives, namely data obfuscation and using the original data