25 research outputs found
Automatically learning patterns for self-admitted technical debt removal
Technical Debt (TD) expresses the need for improvements in a software system, e.g., to its source code or architecture. In certain circumstances, developers “self-admit” technical debt (SATD) in their source code comments. Previous studies investigate when SATD is admitted, and what changes developers perform to remove it. Building on these studies, we present a first step towards the automated recommendation of SATD removal strategies. By leveraging a curated dataset of SATD removal patterns, we build a multi-level classifier capable of recommending six SATD removal strategies, e.g., changing API calls, conditionals, method signatures, exception handling, return statements, or telling that a more complex change is needed. SARDELE (SAtd Removal using DEep LEarning) combines a convolutional neural network trained on embeddings extracted from the SATD comments with a recurrent neural network trained on embeddings extracted from the SATD-affected source code. Our evaluation reveals that SARDELE is able to predict the type of change to be applied with an average precision of ~55%, recall of ~57%, and AUC of 0.73, reaching up to 73% precision, 63% recall, and 0.74 AUC for certain categories such as changes to method calls. Overall, results suggest that SATD removal follows recurrent patterns and indicate the feasibility of supporting developers in this task with automated recommenders
Towards Automatically Addressing Self-Admitted Technical Debt: How Far Are We?
Upon evolving their software, organizations and individual developers have to
spend a substantial effort to pay back technical debt, i.e., the fact that
software is released in a shape not as good as it should be, e.g., in terms of
functionality, reliability, or maintainability. This paper empirically
investigates the extent to which technical debt can be automatically paid back
by neural-based generative models, and in particular models exploiting
different strategies for pre-training and fine-tuning. We start by extracting a
dateset of 5,039 Self-Admitted Technical Debt (SATD) removals from 595
open-source projects. SATD refers to technical debt instances documented (e.g.,
via code comments) by developers. We use this dataset to experiment with seven
different generative deep learning (DL) model configurations. Specifically, we
compare transformers pre-trained and fine-tuned with different combinations of
training objectives, including the fixing of generic code changes, SATD
removals, and SATD-comment prompt tuning. Also, we investigate the
applicability in this context of a recently-available Large Language Model
(LLM)-based chat bot. Results of our study indicate that the automated
repayment of SATD is a challenging task, with the best model we experimented
with able to automatically fix ~2% to 8% of test instances, depending on the
number of attempts it is allowed to make. Given the limited size of the
fine-tuning dataset (~5k instances), the model's pre-training plays a
fundamental role in boosting performance. Also, the ability to remove SATD
steadily drops if the comment documenting the SATD is not provided as input to
the model. Finally, we found general-purpose LLMs to not be a competitive
approach for addressing SATD
GitDelver Enterprise Dataset (GDED):An Industrial Closed-source Dataset for Socio-Technical Research
User Review-Based Change File Localization for Mobile Applications
In the current mobile app development, novel and emerging DevOps practices
(e.g., Continuous Delivery, Integration, and user feedback analysis) and tools
are becoming more widespread. For instance, the integration of user feedback
(provided in the form of user reviews) in the software release cycle represents
a valuable asset for the maintenance and evolution of mobile apps. To fully
make use of these assets, it is highly desirable for developers to establish
semantic links between the user reviews and the software artefacts to be
changed (e.g., source code and documentation), and thus to localize the
potential files to change for addressing the user feedback. In this paper, we
propose RISING (Review Integration via claSsification, clusterIng, and
linkiNG), an automated approach to support the continuous integration of user
feedback via classification, clustering, and linking of user reviews. RISING
leverages domain-specific constraint information and semi-supervised learning
to group user reviews into multiple fine-grained clusters concerning similar
users' requests. Then, by combining the textual information from both commit
messages and source code, it automatically localizes potential change files to
accommodate the users' requests. Our empirical studies demonstrate that the
proposed approach outperforms the state-of-the-art baseline work in terms of
clustering and localization accuracy, and thus produces more reliable results.Comment: 15 pages, 3 figures, 8 table
A Survey on What Developers Think About Testing
Software is infamous for its poor quality and frequent occurrence of bugs.
While there is no doubt that thorough testing is an appropriate answer to
ensure sufficient quality, the poor state of software generally suggests that
developers may not always engage as thoroughly with testing as they should.
This observation aligns with the prevailing belief that developers simply do
not like writing tests. In order to determine the truth of this belief, we
conducted a comprehensive survey with 21 questions aimed at (1) assessing
developers' current engagement with testing and (2) identifying factors
influencing their inclination toward testing; that is, whether they would
actually like to test more but are inhibited by their work environment, or
whether they would really prefer to test even less if given the choice. Drawing
on 284 responses from professional software developers, we uncover reasons that
positively and negatively impact developers' motivation to test. Notably,
reasons for motivation to write more tests encompass not only a general pursuit
of software quality but also personal satisfaction. However, developers
nevertheless perceive testing as mundane and tend to prioritize other tasks.
One approach emerging from the responses to mitigate these negative factors is
by providing better recognition for developers' testing efforts