76,931 research outputs found
Automatic code review by learning the revision of source code
Code review is the process of manual inspection on the revision of the source code in order to find out whether the revised source code eventually meets the revision requirements. However, manual code review is time-consuming, and automating such the code review process will alleviate the burden of code reviewers and speed up the software maintenance process. To construct the model for automatic code review, the characteristics of the revisions of source code (i.e., the difference between the two pieces of source code) should be properly captured and modeled. Unfortunately, most of the existing techniques can easily model the overall correlation between two pieces of source code, but not for the “difference” between two pieces of source code. In this paper, we propose a novel deep model named DACE for automatic code review. Such a model is able to learn revision features by contrasting the revised hunks from the original and revised source code with respect to the code context containing the hunks. Experimental results on six open source software projects indicate by learning the revision features, DACE can outperform the competing approaches in automatic code review
Towards Automatic Identification of Violation Symptoms of Architecture Erosion
Architecture erosion has a detrimental effect on maintenance and evolution,
as the implementation drifts away from the intended architecture. To prevent
this, development teams need to understand early enough the symptoms of
erosion, and particularly violations of the intended architecture. One way to
achieve this, is through the automatic identification of architecture
violations from textual artifacts, and particularly code reviews. In this
paper, we developed 15 machine learning-based and 4 deep learning-based
classifiers with three pre-trained word embeddings to identify violation
symptoms of architecture erosion from developer discussions in code reviews.
Specifically, we looked at code review comments from four large open-source
projects from the OpenStack (Nova and Neutron) and Qt (Qt Base and Qt Creator)
communities. We then conducted a survey to acquire feedback from the involved
participants who discussed architecture violations in code reviews, to validate
the usefulness of our trained classifiers. The results show that the SVM
classifier based on word2vec pre-trained word embedding performs the best with
an F1-score of 0.779. In most cases, classifiers with the fastText pre-trained
word embedding model can achieve relatively good performance. Furthermore,
200-dimensional pre-trained word embedding models outperform classifiers that
use 100 and 300-dimensional models. In addition, an ensemble classifier based
on the majority voting strategy can further enhance the classifier and
outperforms the individual classifiers. Finally, an online survey of the
involved developers reveals that the violation symptoms identified by our
approaches have practical value and can provide early warnings for impending
architecture erosion.Comment: 20 pages, 4 images, 7 tables, Revision submitted to TSE (2023
Mining Fix Patterns for FindBugs Violations
In this paper, we first collect and track a large number of fixed and unfixed
violations across revisions of software.
The empirical analyses reveal that there are discrepancies in the
distributions of violations that are detected and those that are fixed, in
terms of occurrences, spread and categories, which can provide insights into
prioritizing violations.
To automatically identify patterns in violations and their fixes, we propose
an approach that utilizes convolutional neural networks to learn features and
clustering to regroup similar instances. We then evaluate the usefulness of the
identified fix patterns by applying them to unfixed violations.
The results show that developers will accept and merge a majority (69/116) of
fixes generated from the inferred fix patterns. It is also noteworthy that the
yielded patterns are applicable to four real bugs in the Defects4J major
benchmark for software testing and automated repair.Comment: Accepted for IEEE Transactions on Software Engineerin
Untangling Fine-Grained Code Changes
After working for some time, developers commit their code changes to a
version control system. When doing so, they often bundle unrelated changes
(e.g., bug fix and refactoring) in a single commit, thus creating a so-called
tangled commit. Sharing tangled commits is problematic because it makes review,
reversion, and integration of these commits harder and historical analyses of
the project less reliable. Researchers have worked at untangling existing
commits, i.e., finding which part of a commit relates to which task. In this
paper, we contribute to this line of work in two ways: (1) A publicly available
dataset of untangled code changes, created with the help of two developers who
accurately split their code changes into self contained tasks over a period of
four months; (2) a novel approach, EpiceaUntangler, to help developers share
untangled commits (aka. atomic commits) by using fine-grained code change
information. EpiceaUntangler is based and tested on the publicly available
dataset, and further evaluated by deploying it to 7 developers, who used it for
2 weeks. We recorded a median success rate of 91% and average one of 75%, in
automatically creating clusters of untangled fine-grained code changes
- …