4,073 research outputs found
SATDBailiff- Mining and Tracking Self-Admitted Technical Debt
Self-Admitted Technical Debt (SATD) is a metaphorical concept to describe the self-documented addition of technical debt to a software project in the form of source code comments. SATD can linger in projects and degrade source-code quality, but it can also be more visible than unintentionally added or undocumented technical debt. Understanding the implications of adding SATD to a software project is important because developers can benefit from a better understanding of the quality trade-offs they are making. However, empirical studies, analyzing the survivability and removal of SATD comments, are challenged by potential code changes or SATD comment updates that may interfere with properly tracking their appearance, existence, and removal. In this paper, we propose SATDBailiff, a tool that uses an existing state-of-the-art SATD detection tool, to identify SATD in method comments, then properly track their lifespan. SATDBailiff is given as input links to open source projects, and its output is a list of all identified SATDs, and for each detected SATD, SATDBailiff reports all its associated changes, including any updates to its text, all the way to reporting its removal. The goal of SATDBailiff is to aid researchers and practitioners in better tracking SATDs instances, and providing them with a reliable tool that can be easily extended. SATDBailiff was validated using a dataset of previously detected and manually validated SATD instances. SATDBailiff is publicly available as an open source, along with the manual analysis of SATD instances associated with its validation, on the project website
Identifying self-admitted technical debt in issue tracking systems using machine learning
Technical debt is a metaphor indicating sub-optimal solutions implemented for
short-term benefits by sacrificing the long-term maintainability and
evolvability of software. A special type of technical debt is explicitly
admitted by software engineers (e.g. using a TODO comment); this is called
Self-Admitted Technical Debt or SATD. Most work on automatically identifying
SATD focuses on source code comments. In addition to source code comments,
issue tracking systems have shown to be another rich source of SATD, but there
are no approaches specifically for automatically identifying SATD in issues. In
this paper, we first create a training dataset by collecting and manually
analyzing 4,200 issues (that break down to 23,180 sections of issues) from
seven open-source projects (i.e., Camel, Chromium, Gerrit, Hadoop, HBase,
Impala, and Thrift) using two popular issue tracking systems (i.e., Jira and
Google Monorail). We then propose and optimize an approach for automatically
identifying SATD in issue tracking systems using machine learning. Our findings
indicate that: 1) our approach outperforms baseline approaches by a wide margin
with regard to the F1-score; 2) transferring knowledge from suitable datasets
can improve the predictive performance of our approach; 3) extracted SATD
keywords are intuitive and potentially indicating types and indicators of SATD;
4) projects using different issue tracking systems have less common SATD
keywords compared to projects using the same issue tracking system; 5) a small
amount of training data is needed to achieve good accuracy.Comment: Accepted for publication in the EMSE journa
Harnessing customizationinWeb Annotation: ASoftwareProduct Line approach
222 p.La anotación web ayuda a mediar la interacción de lectura y escritura al transmitir información, agregar comentarios e inspirar conversaciones en documentos web. Se utiliza en áreas de Ciencias Sociales y Humanidades, Investigación Periodística, Ciencias Biológicas o Educación, por mencionar algunas. Las actividades de anotación son heterogéneas, donde los usuarios finales (estudiantes, periodistas, conservadores de datos, investigadores, etc.) tienen requisitos muy diferentes para crear, modificar y reutilizar anotaciones. Esto resulta en una gran cantidad de herramientas de anotación web y diferentes formas de representar y almacenar anotaciones web. Para facilitar la reutilización y la interoperabilidad, se han realizado varios intentos durante las últimas décadas para estandarizar las anotaciones web (por ejemplo, Annotea u Open Annotation), lo que ha dado como resultado las recomendaciones de anotaciones del W3C publicadas en 2017. Las recomendaciones del W3C proporcionan un marco para la representación de anotaciones (modelo de datos y vocabulario) y transporte (protocolo). Sin embargo, todavía hay una brecha en cómo se desarrollan los clientes de anotación (herramientas e interfaces de usuario), lo que hace que los desarrolladores vuelvan a re-implementar funcionalidades comunes (esdecir, resaltar, comentar, almacenar,¿) para crear su herramienta de anotación personalizada.Esta tesis tiene como objetivo proporcionar una plataforma de reutilización para el desarrollo de herramientas de anotación web para la revisión. Con este fin, hemos desarrollado una línea de productos de software llamada WACline. WACline es una familia de productos de anotación que permite a los desarrolladores crear extensiones de navegador de anotación web personalizadas, lo que facilita la reutilización de los activos principales y su adaptación a su contexto de revisión específico. Se ha creado siguiendo un proceso de acumulación de conocimientos en el que cada producto de anotación aprende de los productos de anotación creados previamente. Finalmente, llegamos a una familia de clientes de anotación que brinda soporte para tres prácticas de revisión: extracción de datos de revisión sistemática de literatura (Highlight&Go), revisión de tareas de estudiantes en educación superior (Mark&Go), y revisión por pares de conferencias y revistas (Review&Go). Para cada uno de los contextos de revisión, se ha llevado a cabo una evaluación con partes interesadas reales para validar las mejoras de eficiencia y eficacia aportadas por las herramientas de anotación personalizadas en su práctica
Identifying Self-Admitted Technical Debt
Technical debt is a metaphor coined to express the trade off between productivity and quality, e.g., when developers take shortcuts or perform quick hacks during the development of software projects. These non optimal solutions are often implemented to allow the project to move faster in the short term, at the cost of increased maintenance in the future. The accumulation of technical debt during the ever changing life-cycle of a project is unavoidable, and if not properly managed can severely hinder the development of the project. To help alleviate the impact of technical debt, a number of studies focused on the detection of technical debt. However, a recent study has shown that one possible source to detect technical debt is using source code comments, also referred to as self-admitted technical debt. Therefore, in this dissertation we use empirical studies and NLP techniques to propose an approach to automatically identify self-admitted technical debt.
First, we examine source code comments to determine the different types of technical debt, and we propose four simple filtering heuristics to eliminate comments that are not likely to contain technical debt. Then, we read through more than 33K comments, and we find that self-admitted technical debt can be classified into five main types - design debt, defect debt, documentation debt, requirement debt and test debt. In addition, two most common types of self-admitted technical debt are design and requirement debt, making up between 42% to 84% and 5% to 45% of the classified comments, respectively.
Second, we leverage the knowledge obtained in our first study to present an approach to automatically identify design and requirement self-admitted technical debt using Natural Language Processing (NLP). We study 10 open source projects: Ant, ArgoUML, Columba, EMF, Hibernate, JEdit, JFreeChart, Jmeter, JRuby and SQuirrel SQL and find that 1) we are able to effectively identify self-admitted technical debt, significantly outperforming state-of-the-art techniques; 2) that words related to sloppy or mediocre source code are the best indicators of design debt, whereas for requirement debt, words related to enhancing or completing tasks are the best indicators; and 3) we can achieve 90% of the best classification performance, using as little as 23% of the comments for both design and requirement self-admitted technical debt, and 80% of the best performance, using as little as 9% and 5% of the comments for design and requirement self-admitted technical debt, respectively
Towards the Repayment of Self-Admitted Technical Debt
Technical Debt is a metaphor used to express sub-optimal source code implementations that are introduced for short-term benefits that often must be paid back later, at an increased cost. In recent years, various empirical studies have focused on investigating source code comments that indicate Technical Debt, often referred to as Self-Admitted Technical Debt (SATD).
In this thesis, we survey research work on SATD, analyzing characteristics of current approaches and techniques for SATD, dividing literature in three categories: detection, comprehension, and repayment. To set the stage for novel and improved work on SATD, we compile tools, resources, and data sets made publicly available. We also identify areas that are missing investigation, open challenges, and discuss potential future research avenues. From the literature survey, we conclude that most findings and contributions have focused on techniques to identify, classify, and comprehend SATD. Few studies focused on the repayment or management of SATD, which is an essential goal of studying technical debt for software maintenance.
Therefore, we perform an empirical study towards SATD repayment. We conducted a preliminary online survey with developers to understand the elements they consider to prioritize SATD. With the acquired knowledge from the survey responses and previous literature work, we select metrics to estimate SATD repayment effort. We examine SATD instances found in software systems to see how it has been repaid and investigate the possibility of using historical data at the time of SATD introduction as indicators for SATD that should be addressed. We find two SATD repayment effort metrics that can be consistently modeled in our studied projects and surface the best early indicators for important SATD
Recommended from our members
Avoiding the Gaze of the Test: High Stakes Literacy Policy Implementation
This qualitative embedded case study documents the policy implementation of literacy assessment in a Texas urban high school, using Foucault’s theory of the panopticon to understand how teaching and learning were shaped by the state high-stakes exit exam. In addition to the strong influence of the State of Texas Assessments of Academic Readiness (STAAR) test, data also indicated that despite an atmosphere of surveillance, teachers also worked strategically to make room for other types of literacy instruction. This was most visible in English I and II teachers’ commitment to process-based writing instruction focused on real-world text genres.Educatio
- …