18,360 research outputs found

    Recommending When Design Technical Debt Should Be Self-Admitted

    Get PDF
    Les Technical Debts (TD) sont des solutions temporaires et peu optimales introduites dans le code source d’un logiciel informatique pour corriger un problème rapidement au détriment de la qualité logiciel. Cette pratique est répandue pour diverses raisons: rapidité d’implémentation, conception initiale des composantes, connaissances faibles du projet, inexpérience du développeur ou pression face aux dates limites. Les TD peuvent s’avérer utiles à court terme, mais excessivement dommageables pour un logiciel et au niveau du temps perdu. En effet, le temps requis pour régler des problèmes et concevoir du code de qualité n’est souvent pas compatible avec le cycle de développement d’un projet. C’est pourquoi le sujet des dettes techniques a déjà été analysé dans de nombreuses études, plus spécifiquement dans l’optique de les détecter et de les identifier. Une approche populaire et récente est d’identifier les dettes techniques qui sont consciemment admises dans le code. La particularité de ces dettes, en comparaison aux TD, est qu’elles sont explicitement documentées par commentaires et intentionnellement introduites dans le code source. Les Self-Admitted Technical Debts (SATD) ne sont pas rares dans les projets logiciels et ont déjà été largement étudiées concernant leur diffusion, leur impact sur la qualité logiciel, leur criticité, leur évolution et leurs acteurs. Diverses méthodes de détection sont présentement utilisées pour identifier les SATD mais toutes demeurent sujettes à amélioration. Donc, cette thèse analyse dans quelle mesure des dettes techniques ayant déjà été consciemment admises (SATD) peuvent être utilisées pour fournir des recommandations aux développeurs lorsqu’ils écrivent du nouveau code. Pour atteindre ce but, une approche d’apprentissage machine a été élaborée, nommée TEchnical Debt IdentificatiOn System (TEDIOUS), utilisant comme variables indépendantes divers types de métriques et d’avertissements, de manière à pouvoir classifier des dettes techniques de conception au niveau des méthodes avec comme oracle des SATD connus.----------ABSTRACT: Technical debts are temporary solutions, or workarounds, introduced in portions of software systems in order to fix a problem rapidly at the expense of quality. Such practices are widespread for various reasons: rapidity of implementation, initial conception of components, lack of system’s knowledge, developer inexperience or deadline pressure. Even though technical debts can be useful on a short term basis, they can be excessively damaging and time consuming in the long run. Indeed, the time required to fix problems and design code is frequently not compatible with the development life cycle of a project. This is why the issue has been tackled in various studies, specifically in the aim of detecting these harmful debts. One recent and popular approach is to identify technical debts which are self-admitted (SATD). The particularity of these debts, in comparison to TD, is that they are explicitly documented with comments and that they are intentionally introduced in the source code. SATD are not uncommon in software projects and have already been extensively studied concerning their diffusion, their impact on software quality, their criticality, their evolution and the actors involved. Various detection methods are currently used to identify SATD but they are still subject to improvement. Therefore, this thesis investigates to what extent previously self-admitted technical debts can be used to provide recommendations to developers writing new source code. To achieve this goal, a machine learning approach was conceived, named TEDIOUS, using various types of method-level input features as independent variables, to classify design technical debts in methods using known self-admitted technical debts as an oracle

    SATDBailiff- Mining and Tracking Self-Admitted Technical Debt

    Get PDF
    Self-Admitted Technical Debt (SATD) is a metaphorical concept to describe the self-documented addition of technical debt to a software project in the form of source code comments. SATD can linger in projects and degrade source-code quality, but it can also be more visible than unintentionally added or undocumented technical debt. Understanding the implications of adding SATD to a software project is important because developers can benefit from a better understanding of the quality trade-offs they are making. However, empirical studies, analyzing the survivability and removal of SATD comments, are challenged by potential code changes or SATD comment updates that may interfere with properly tracking their appearance, existence, and removal. In this paper, we propose SATDBailiff, a tool that uses an existing state-of-the-art SATD detection tool, to identify SATD in method comments, then properly track their lifespan. SATDBailiff is given as input links to open source projects, and its output is a list of all identified SATDs, and for each detected SATD, SATDBailiff reports all its associated changes, including any updates to its text, all the way to reporting its removal. The goal of SATDBailiff is to aid researchers and practitioners in better tracking SATDs instances, and providing them with a reliable tool that can be easily extended. SATDBailiff was validated using a dataset of previously detected and manually validated SATD instances. SATDBailiff is publicly available as an open source, along with the manual analysis of SATD instances associated with its validation, on the project website

    Automatically learning patterns for self-admitted technical debt removal

    Get PDF
    Technical Debt (TD) expresses the need for improvements in a software system, e.g., to its source code or architecture. In certain circumstances, developers “self-admit” technical debt (SATD) in their source code comments. Previous studies investigate when SATD is admitted, and what changes developers perform to remove it. Building on these studies, we present a first step towards the automated recommendation of SATD removal strategies. By leveraging a curated dataset of SATD removal patterns, we build a multi-level classifier capable of recommending six SATD removal strategies, e.g., changing API calls, conditionals, method signatures, exception handling, return statements, or telling that a more complex change is needed. SARDELE (SAtd Removal using DEep LEarning) combines a convolutional neural network trained on embeddings extracted from the SATD comments with a recurrent neural network trained on embeddings extracted from the SATD-affected source code. Our evaluation reveals that SARDELE is able to predict the type of change to be applied with an average precision of ~55%, recall of ~57%, and AUC of 0.73, reaching up to 73% precision, 63% recall, and 0.74 AUC for certain categories such as changes to method calls. Overall, results suggest that SATD removal follows recurrent patterns and indicate the feasibility of supporting developers in this task with automated recommenders

    Rework Effort Estimation of Self-admitted Technical Debt

    Get PDF
    Programmers sometimes leave incomplete, temporary workarounds and buggy codes that require rework. This phenomenon in software development is referred to as Self- admitted Technical Debt (SATD). The challenge therefore is for software engineering researchers and practitioners to resolve the SATD problem to improve the software quality. We performed an exploratory study using a text mining approach to extract SATD from developers’ source code comments and implement an effort metric to compute the rework effort that might be needed to resolve the SATD problem. The result of this study confirms the result of a prior study that found design debt to be the most predominant class of SATD. Results from this study also indicate that a significant amount of rework effort of between 13 and 32 commented LOC on average per SATD prone source file is required to resolve the SATD challenge across all the four projects considered. The text mining approach incorporated into the rework effort metric will speed up the extraction and analysis of SATD that are generated during software projects. It will also aid in managerial decisions of whether to handle SATD as part of on-going project development or defer it to the maintenance phase

    Toward the Automatic Classification of Self-Affirmed Refactoring

    Get PDF
    The concept of Self-Affirmed Refactoring (SAR) was introduced to explore how developers document their refactoring activities in commit messages, i.e., developers explicit documentation of refactoring operations intentionally introduced during a code change. In our previous study, we have manually identified refactoring patterns and defined three main common quality improvement categories including internal quality attributes, external quality attributes, and code smells, by only considering refactoring-related commits. However, this approach heavily depends on the manual inspection of commit messages. In this paper, we propose a two-step approach to first identify whether a commit describes developer-related refactoring events, then to classify it according to the refactoring common quality improvement categories. Specifically, we combine the N-Gram TF-IDF feature selection with binary and multiclass classifiers to build a new model to automate the classification of refactorings based on their quality improvement categories. We challenge our model using a total of 2,867 commit messages extracted from well engineered open-source Java projects. Our findings show that (1) our model is able to accurately classify SAR commits, outperforming the pattern-based and random classifier approaches, and allowing the discovery of 40 more relevent SAR patterns, and (2) our model reaches an F-measure of up to 90% even with a relatively small training datase

    State of Refactoring Adoption: Better Understanding Developer Perception of Refactoring

    Full text link
    We aim to explore how developers document their refactoring activities during the software life cycle. We call such activity Self-Affirmed Refactoring (SAR), which indicates developers' documentation of their refactoring activities. SAR is crucial in understanding various aspects of refactoring, including the motivation, procedure, and consequences of the performed code change. After that, we propose an approach to identify whether a commit describes developer-related refactoring events to classify them according to the refactoring common quality improvement categories. To complement this goal, we aim to reveal insights into how reviewers decide to accept or reject a submitted refactoring request and what makes such a review challenging.Our SAR taxonomy and model can work with refactoring detectors to report any early inconsistency between refactoring types and their documentation. They can serve as a solid background for various empirical investigations. Our survey with code reviewers has revealed several difficulties related to understanding the refactoring intent and implications on the functional and non-functional aspects of the software. In light of our findings from the industrial case study, we recommended a procedure to properly document refactoring activities, as part of our survey feedback.Comment: arXiv admin note: text overlap with arXiv:2010.13890, arXiv:2102.05201, arXiv:2009.0927

    Towards Automatically Addressing Self-Admitted Technical Debt: How Far Are We?

    Full text link
    Upon evolving their software, organizations and individual developers have to spend a substantial effort to pay back technical debt, i.e., the fact that software is released in a shape not as good as it should be, e.g., in terms of functionality, reliability, or maintainability. This paper empirically investigates the extent to which technical debt can be automatically paid back by neural-based generative models, and in particular models exploiting different strategies for pre-training and fine-tuning. We start by extracting a dateset of 5,039 Self-Admitted Technical Debt (SATD) removals from 595 open-source projects. SATD refers to technical debt instances documented (e.g., via code comments) by developers. We use this dataset to experiment with seven different generative deep learning (DL) model configurations. Specifically, we compare transformers pre-trained and fine-tuned with different combinations of training objectives, including the fixing of generic code changes, SATD removals, and SATD-comment prompt tuning. Also, we investigate the applicability in this context of a recently-available Large Language Model (LLM)-based chat bot. Results of our study indicate that the automated repayment of SATD is a challenging task, with the best model we experimented with able to automatically fix ~2% to 8% of test instances, depending on the number of attempts it is allowed to make. Given the limited size of the fine-tuning dataset (~5k instances), the model's pre-training plays a fundamental role in boosting performance. Also, the ability to remove SATD steadily drops if the comment documenting the SATD is not provided as input to the model. Finally, we found general-purpose LLMs to not be a competitive approach for addressing SATD

    State of Refactoring Adoption: Towards Better Understanding Developer Perception of Refactoring

    Get PDF
    Context: Refactoring is the art of improving the structural design of a software system without altering its external behavior. Today, refactoring has become a well-established and disciplined software engineering practice that has attracted a significant amount of research presuming that refactoring is primarily motivated by the need to improve system structures. However, recent studies have shown that developers may incorporate refactoring strategies in other development-related activities that go beyond improving the design especially with the emerging challenges in contemporary software engineering. Unfortunately, these studies are limited to developer interviews and a reduced set of projects. Objective: We aim at exploring how developers document their refactoring activities during the software life cycle. We call such activity Self-Affirmed Refactoring (SAR), which is an indication of the developer-related refactoring events in the commit messages. After that, we propose an approach to identify whether a commit describes developer-related refactoring events, to classify them according to the refactoring common quality improvement categories. To complement this goal, we aim to reveal insights into how reviewers develop a decision about accepting or rejecting a submitted refactoring request, what makes such review challenging, and how to the efficiency of refactoring code review. Method: Our empirically driven study follows a mixture of qualitative and quantitative methods. We text mine refactoring-related documentation, then we develop a refactoring taxonomy, and automatically classify a large set of commits containing refactoring activities, and identify, among the various quality models presented in the literature, the ones that are more in-line with the developer\u27s vision of quality optimization, when they explicitly mention that they are refactoring to improve them to obtain an enhanced understanding of the motivation behind refactoring. After that, we performed an industrial case study with professional developers at Xerox to study the motivations, documentation practices, challenges, verification, and implications of refactoring activities during code review. Result: We introduced SAR taxonomy on how developers document their refactoring strategies in commit messages and proposed a SAR model to automate the detection of refactoring. Our survey with code reviewers has revealed several difficulties related to understanding the refactoring intent and implications on the functional and non-functional aspects of the software. Conclusion: Our SAR taxonomy and model, can work in conjunction with refactoring detectors, to report any early inconsistency between refactoring types and their documentation and can serve as a solid background for various empirical investigations. In light of our findings of the industrial case study, we recommended a procedure to properly document refactoring activities, as part of our survey feedback

    Understanding, Analysis, and Handling of Software Architecture Erosion

    Get PDF
    Architecture erosion occurs when a software system's implemented architecture diverges from the intended architecture over time. Studies show erosion impacts development, maintenance, and evolution since it accumulates imperceptibly. Identifying early symptoms like architectural smells enables managing erosion through refactoring. However, research lacks comprehensive understanding of erosion, unclear which symptoms are most common, and lacks detection methods. This thesis establishes an erosion landscape, investigates symptoms, and proposes identification approaches. A mapping study covers erosion definitions, symptoms, causes, and consequences. Key findings: 1) "Architecture erosion" is the most used term, with four perspectives on definitions and respective symptom types. 2) Technical and non-technical reasons contribute to erosion, negatively impacting quality attributes. Practitioners can advocate addressing erosion to prevent failures. 3) Detection and correction approaches are categorized, with consistency and evolution-based approaches commonly mentioned.An empirical study explores practitioner perspectives through communities, surveys, and interviews. Findings reveal associated practices like code review and tools identify symptoms, while collected measures address erosion during implementation. Studying code review comments analyzes erosion in practice. One study reveals architectural violations, duplicate functionality, and cyclic dependencies are most frequent. Symptoms decreased over time, indicating increased stability. Most were addressed after review. A second study explores violation symptoms in four projects, identifying 10 categories. Refactoring and removing code address most violations, while some are disregarded.Machine learning classifiers using pre-trained word embeddings identify violation symptoms from code reviews. Key findings: 1) SVM with word2vec achieved highest performance. 2) fastText embeddings worked well. 3) 200-dimensional embeddings outperformed 100/300-dimensional. 4) Ensemble classifier improved performance. 5) Practitioners found results valuable, confirming potential.An automated recommendation system identifies qualified reviewers for violations using similarity detection on file paths and comments. Experiments show common methods perform well, outperforming a baseline approach. Sampling techniques impact recommendation performance
    • …
    corecore