4,073 research outputs found

    SATDBailiff- Mining and Tracking Self-Admitted Technical Debt

    Get PDF
    Self-Admitted Technical Debt (SATD) is a metaphorical concept to describe the self-documented addition of technical debt to a software project in the form of source code comments. SATD can linger in projects and degrade source-code quality, but it can also be more visible than unintentionally added or undocumented technical debt. Understanding the implications of adding SATD to a software project is important because developers can benefit from a better understanding of the quality trade-offs they are making. However, empirical studies, analyzing the survivability and removal of SATD comments, are challenged by potential code changes or SATD comment updates that may interfere with properly tracking their appearance, existence, and removal. In this paper, we propose SATDBailiff, a tool that uses an existing state-of-the-art SATD detection tool, to identify SATD in method comments, then properly track their lifespan. SATDBailiff is given as input links to open source projects, and its output is a list of all identified SATDs, and for each detected SATD, SATDBailiff reports all its associated changes, including any updates to its text, all the way to reporting its removal. The goal of SATDBailiff is to aid researchers and practitioners in better tracking SATDs instances, and providing them with a reliable tool that can be easily extended. SATDBailiff was validated using a dataset of previously detected and manually validated SATD instances. SATDBailiff is publicly available as an open source, along with the manual analysis of SATD instances associated with its validation, on the project website

    Identifying self-admitted technical debt in issue tracking systems using machine learning

    Get PDF
    Technical debt is a metaphor indicating sub-optimal solutions implemented for short-term benefits by sacrificing the long-term maintainability and evolvability of software. A special type of technical debt is explicitly admitted by software engineers (e.g. using a TODO comment); this is called Self-Admitted Technical Debt or SATD. Most work on automatically identifying SATD focuses on source code comments. In addition to source code comments, issue tracking systems have shown to be another rich source of SATD, but there are no approaches specifically for automatically identifying SATD in issues. In this paper, we first create a training dataset by collecting and manually analyzing 4,200 issues (that break down to 23,180 sections of issues) from seven open-source projects (i.e., Camel, Chromium, Gerrit, Hadoop, HBase, Impala, and Thrift) using two popular issue tracking systems (i.e., Jira and Google Monorail). We then propose and optimize an approach for automatically identifying SATD in issue tracking systems using machine learning. Our findings indicate that: 1) our approach outperforms baseline approaches by a wide margin with regard to the F1-score; 2) transferring knowledge from suitable datasets can improve the predictive performance of our approach; 3) extracted SATD keywords are intuitive and potentially indicating types and indicators of SATD; 4) projects using different issue tracking systems have less common SATD keywords compared to projects using the same issue tracking system; 5) a small amount of training data is needed to achieve good accuracy.Comment: Accepted for publication in the EMSE journa

    Automating change-level self-admitted technical debt determination

    Get PDF

    Harnessing customizationinWeb Annotation: ASoftwareProduct Line approach

    Get PDF
    222 p.La anotación web ayuda a mediar la interacción de lectura y escritura al transmitir información, agregar comentarios e inspirar conversaciones en documentos web. Se utiliza en áreas de Ciencias Sociales y Humanidades, Investigación Periodística, Ciencias Biológicas o Educación, por mencionar algunas. Las actividades de anotación son heterogéneas, donde los usuarios finales (estudiantes, periodistas, conservadores de datos, investigadores, etc.) tienen requisitos muy diferentes para crear, modificar y reutilizar anotaciones. Esto resulta en una gran cantidad de herramientas de anotación web y diferentes formas de representar y almacenar anotaciones web. Para facilitar la reutilización y la interoperabilidad, se han realizado varios intentos durante las últimas décadas para estandarizar las anotaciones web (por ejemplo, Annotea u Open Annotation), lo que ha dado como resultado las recomendaciones de anotaciones del W3C publicadas en 2017. Las recomendaciones del W3C proporcionan un marco para la representación de anotaciones (modelo de datos y vocabulario) y transporte (protocolo). Sin embargo, todavía hay una brecha en cómo se desarrollan los clientes de anotación (herramientas e interfaces de usuario), lo que hace que los desarrolladores vuelvan a re-implementar funcionalidades comunes (esdecir, resaltar, comentar, almacenar,¿) para crear su herramienta de anotación personalizada.Esta tesis tiene como objetivo proporcionar una plataforma de reutilización para el desarrollo de herramientas de anotación web para la revisión. Con este fin, hemos desarrollado una línea de productos de software llamada WACline. WACline es una familia de productos de anotación que permite a los desarrolladores crear extensiones de navegador de anotación web personalizadas, lo que facilita la reutilización de los activos principales y su adaptación a su contexto de revisión específico. Se ha creado siguiendo un proceso de acumulación de conocimientos en el que cada producto de anotación aprende de los productos de anotación creados previamente. Finalmente, llegamos a una familia de clientes de anotación que brinda soporte para tres prácticas de revisión: extracción de datos de revisión sistemática de literatura (Highlight&Go), revisión de tareas de estudiantes en educación superior (Mark&Go), y revisión por pares de conferencias y revistas (Review&Go). Para cada uno de los contextos de revisión, se ha llevado a cabo una evaluación con partes interesadas reales para validar las mejoras de eficiencia y eficacia aportadas por las herramientas de anotación personalizadas en su práctica

    Identifying Self-Admitted Technical Debt

    Get PDF
    Technical debt is a metaphor coined to express the trade off between productivity and quality, e.g., when developers take shortcuts or perform quick hacks during the development of software projects. These non optimal solutions are often implemented to allow the project to move faster in the short term, at the cost of increased maintenance in the future. The accumulation of technical debt during the ever changing life-cycle of a project is unavoidable, and if not properly managed can severely hinder the development of the project. To help alleviate the impact of technical debt, a number of studies focused on the detection of technical debt. However, a recent study has shown that one possible source to detect technical debt is using source code comments, also referred to as self-admitted technical debt. Therefore, in this dissertation we use empirical studies and NLP techniques to propose an approach to automatically identify self-admitted technical debt. First, we examine source code comments to determine the different types of technical debt, and we propose four simple filtering heuristics to eliminate comments that are not likely to contain technical debt. Then, we read through more than 33K comments, and we find that self-admitted technical debt can be classified into five main types - design debt, defect debt, documentation debt, requirement debt and test debt. In addition, two most common types of self-admitted technical debt are design and requirement debt, making up between 42% to 84% and 5% to 45% of the classified comments, respectively. Second, we leverage the knowledge obtained in our first study to present an approach to automatically identify design and requirement self-admitted technical debt using Natural Language Processing (NLP). We study 10 open source projects: Ant, ArgoUML, Columba, EMF, Hibernate, JEdit, JFreeChart, Jmeter, JRuby and SQuirrel SQL and find that 1) we are able to effectively identify self-admitted technical debt, significantly outperforming state-of-the-art techniques; 2) that words related to sloppy or mediocre source code are the best indicators of design debt, whereas for requirement debt, words related to enhancing or completing tasks are the best indicators; and 3) we can achieve 90% of the best classification performance, using as little as 23% of the comments for both design and requirement self-admitted technical debt, and 80% of the best performance, using as little as 9% and 5% of the comments for design and requirement self-admitted technical debt, respectively

    Towards the Repayment of Self-Admitted Technical Debt

    Get PDF
    Technical Debt is a metaphor used to express sub-optimal source code implementations that are introduced for short-term benefits that often must be paid back later, at an increased cost. In recent years, various empirical studies have focused on investigating source code comments that indicate Technical Debt, often referred to as Self-Admitted Technical Debt (SATD). In this thesis, we survey research work on SATD, analyzing characteristics of current approaches and techniques for SATD, dividing literature in three categories: detection, comprehension, and repayment. To set the stage for novel and improved work on SATD, we compile tools, resources, and data sets made publicly available. We also identify areas that are missing investigation, open challenges, and discuss potential future research avenues. From the literature survey, we conclude that most findings and contributions have focused on techniques to identify, classify, and comprehend SATD. Few studies focused on the repayment or management of SATD, which is an essential goal of studying technical debt for software maintenance. Therefore, we perform an empirical study towards SATD repayment. We conducted a preliminary online survey with developers to understand the elements they consider to prioritize SATD. With the acquired knowledge from the survey responses and previous literature work, we select metrics to estimate SATD repayment effort. We examine SATD instances found in software systems to see how it has been repaid and investigate the possibility of using historical data at the time of SATD introduction as indicators for SATD that should be addressed. We find two SATD repayment effort metrics that can be consistently modeled in our studied projects and surface the best early indicators for important SATD
    corecore