5 research outputs found
Discovering Loners and Phantoms in Commit and Issue Data
The interlinking of commit and issue data has become a de-facto standard in software development. Modern issue tracking systems, such as JIRA, automatically interlink commits and issues by the extraction of identifiers (e.g., issue key) from commit messages. However, the conventions for the use of interlinking methodologies vary between software projects. For example, some projects enforce the use of identifiers for every commit while others have less restrictive conventions. In this work, we introduce a model called PaLiMod to enable the analysis of interlinking characteristics in commit and issue data. We surveyed 15 Apache projects to investigate differences and commonalities between linked and non-linked commits and issues. Based on the gathered information, we created a set of heuristics to interlink the residual of non-linked commits and issues. We present the characteristics of Loners and Phantoms in commit and issue data. The results of our evaluation indicate that the proposed PaLiMod model and heuristics enable an automatic interlinking and can indeed reduce the residual of non-linked commits and issues in software projects
EALink: An Efficient and Accurate Pre-trained Framework for Issue-Commit Link Recovery
Issue-commit links, as a type of software traceability links, play a vital
role in various software development and maintenance tasks. However, they are
typically deficient, as developers often forget or fail to create tags when
making commits. Existing studies have deployed deep learning techniques,
including pretrained models, to improve automatic issue-commit link
recovery.Despite their promising performance, we argue that previous approaches
have four main problems, hindering them from recovering links in large software
projects. To overcome these problems, we propose an efficient and accurate
pre-trained framework called EALink for issue-commit link recovery. EALink
requires much fewer model parameters than existing pre-trained methods,
bringing efficient training and recovery. Moreover, we design various
techniques to improve the recovery accuracy of EALink. We construct a
large-scale dataset and conduct extensive experiments to demonstrate the power
of EALink. Results show that EALink outperforms the state-of-the-art methods by
a large margin (15.23%-408.65%) on various evaluation metrics. Meanwhile, its
training and inference overhead is orders of magnitude lower than existing
methods.Comment: 13 pages, 6 figures, published to AS