60 research outputs found
Recommended from our members
Facilitating software evolution through natural language comments and dialogue
Software projects are continually evolving, as developers incorporate changes to refactor code, support new functionality, and fix bugs. To uphold software quality amidst constant changes and also facilitate prompt implementation of critical changes, it is desirable to have automated tools for supporting and driving software evolution. In this thesis, we explore tasks and data and design machine learning approaches which leverage natural language to serve this purpose.
When developers make code changes, they sometimes fail to update the accompanying natural language comments documenting various aspects of the code, which can lead to confusion and vulnerability to bugs. We present our work on alerting developers of inconsistent comments upon code changes and suggesting updates by learning to correlate comments and code.
When a bug is reported, developers engage in a dialogue to collaboratively understand it and ultimately resolve it. While the solution is likely formulated within the discussion, it is often buried in a large amount of text, making it difficult to comprehend, which delays its implementation through the necessary repository changes. To guide developers in more easily absorbing information relevant towards making these changes and consequently expedite bug resolution, we investigate generating a concise natural language description of the solution by synthesizing relevant content as it emerges in the discussion. We benchmark models for generating solution descriptions and design a classifier for determining when sufficient context for generating an informative description becomes available. We investigate approaches for real-time generation, entailing separately trained and jointly trained classification and generation models. Furthermore, we also study techniques for deriving natural language context from bug report discussions and generated solution descriptions to guide models in generating suggested bug-resolving code changes.Computer Science
Web API evolution patterns: A usage-driven approach
As the use of Application Programming Interfaces (APIs) is increasingly growing, their evolution becomes more challenging in terms of the service provided according to consumers' needs. In this paper, we address the role of consumers' needs in WAPIs evolution and introduce a process mining pattern-based method to support providers in WAPIs evolution by analyzing and understanding consumers' behavior, imprinted in WAPI usage logs. We take the position that WAPIs' evolution should be mainly usage-based, i.e., the way consumers use them should be one of the main drivers of their changes. We start by characterizing the structural relationships between endpoints, and next, we summarize these relationships into a set of behavioral patterns (i.e., usage patterns whose occurrences indicate specific consumers' behavior like repetitive or consecutive calls), that can potentially imply the need for changes (e.g., creating new parameters for endpoints, merging endpoints). We analyze the logs and extract several metrics for the endpoints and their relationships, to then detect the patterns. We apply our method in two real-world WAPIs from different domains, education, and health, respectively the WAPI of Barcelona School of Informatics at the Polytechnic University of Catalonia (Facultat d'Informàtica de Barcelona, FIB, UPC), and District Health Information Software 2 (DHIS2) WAPI. The feedback from consumers and providers of these WAPIs proved the effectiveness of the detected patterns and confirmed the promising potential of our approach.This paper has been funded by the Spanish Ministerio de Ciencia e Innovación under project/funding scheme PID2020-117191RB-I00/AEI/10.13039/501100011033.Peer ReviewedPostprint (published version
Understanding, Analysis, and Handling of Software Architecture Erosion
Architecture erosion occurs when a software system's implemented architecture diverges from the intended architecture over time. Studies show erosion impacts development, maintenance, and evolution since it accumulates imperceptibly. Identifying early symptoms like architectural smells enables managing erosion through refactoring. However, research lacks comprehensive understanding of erosion, unclear which symptoms are most common, and lacks detection methods. This thesis establishes an erosion landscape, investigates symptoms, and proposes identification approaches. A mapping study covers erosion definitions, symptoms, causes, and consequences. Key findings: 1) "Architecture erosion" is the most used term, with four perspectives on definitions and respective symptom types. 2) Technical and non-technical reasons contribute to erosion, negatively impacting quality attributes. Practitioners can advocate addressing erosion to prevent failures. 3) Detection and correction approaches are categorized, with consistency and evolution-based approaches commonly mentioned.An empirical study explores practitioner perspectives through communities, surveys, and interviews. Findings reveal associated practices like code review and tools identify symptoms, while collected measures address erosion during implementation. Studying code review comments analyzes erosion in practice. One study reveals architectural violations, duplicate functionality, and cyclic dependencies are most frequent. Symptoms decreased over time, indicating increased stability. Most were addressed after review. A second study explores violation symptoms in four projects, identifying 10 categories. Refactoring and removing code address most violations, while some are disregarded.Machine learning classifiers using pre-trained word embeddings identify violation symptoms from code reviews. Key findings: 1) SVM with word2vec achieved highest performance. 2) fastText embeddings worked well. 3) 200-dimensional embeddings outperformed 100/300-dimensional. 4) Ensemble classifier improved performance. 5) Practitioners found results valuable, confirming potential.An automated recommendation system identifies qualified reviewers for violations using similarity detection on file paths and comments. Experiments show common methods perform well, outperforming a baseline approach. Sampling techniques impact recommendation performance
A Fully Parallelized and Budgeted Multi-level Monte Carlo Framework for Partial Differential Equations: From Mathematical Theory to Automated Large-Scale Computations
All collected data on any physical, technical or economical process is subject to uncertainty. By incorporating this uncertainty in the model and propagating it through the system, this data error can be controlled. This makes the predictions of the system more trustworthy and reliable. The multi-level Monte Carlo (MLMC) method has proven to be an effective uncertainty quantification tool, requiring little knowledge about the problem while being highly performant.
In this doctoral thesis we analyse, implement, develop and apply the MLMC method to partial differential equations (PDEs) subject to high-dimensional random input data. We set up a unified framework based on the software M++ to approximate solutions to elliptic and hyperbolic PDEs with a large selection of finite element methods. We combine this setup with a new variant of the MLMC method. In particular, we propose a budgeted MLMC (BMLMC) method which is capable to optimally invest reserved computing resources in order to minimize the model error while exhausting a given computational budget. This is achieved by developing a new parallelism based on a single distributed data structure, employing ideas of the continuation MLMC method and utilizing dynamic programming techniques. The final method is theoretically motivated, analyzed, and numerically well-tested in an automated benchmarking workflow for highly challenging problems like the approximation of wave equations in randomized media
State of Refactoring Adoption: Better Understanding Developer Perception of Refactoring
We aim to explore how developers document their refactoring activities during
the software life cycle. We call such activity Self-Affirmed Refactoring (SAR),
which indicates developers' documentation of their refactoring activities. SAR
is crucial in understanding various aspects of refactoring, including the
motivation, procedure, and consequences of the performed code change. After
that, we propose an approach to identify whether a commit describes
developer-related refactoring events to classify them according to the
refactoring common quality improvement categories. To complement this goal, we
aim to reveal insights into how reviewers decide to accept or reject a
submitted refactoring request and what makes such a review challenging.Our SAR
taxonomy and model can work with refactoring detectors to report any early
inconsistency between refactoring types and their documentation. They can serve
as a solid background for various empirical investigations. Our survey with
code reviewers has revealed several difficulties related to understanding the
refactoring intent and implications on the functional and non-functional
aspects of the software. In light of our findings from the industrial case
study, we recommended a procedure to properly document refactoring activities,
as part of our survey feedback.Comment: arXiv admin note: text overlap with arXiv:2010.13890,
arXiv:2102.05201, arXiv:2009.0927
Using Active Learning to Teach Critical and Contextual Studies: One Teaching Plan, Two Experiments, Three Videos.
Since the 1970s, art and design education at UK universities has existedas a divided practice; on the one hand applying active learning in thestudio and on the other hand enforcing passive learning in the lecturetheatre. As a result, art and design students are in their vast majorityreluctant about modules that may require them to think, read and writecritically during their academic studies. This article describes, evaluatesand analyses two individual active learning experiments designed todetermine if it is possible to teach CCS modules in a manner thatencourages student participation. The results reveal that opting foractive learning methods improved academic achievement, encouragedcooperation, and enforced an inclusive classroom. Furthermore, andcontrary to wider perception, the article demonstrates that activelearning methods can be equally beneficial for small-size as well aslarge-size groups
Streamlining code smells: Using collective intelligence and visualization
Context. Code smells are seen as major source of technical debt and, as such, should be detected and removed. Code smells have long been catalogued with corresponding mitigating solutions called refactoring operations. However, while the latter are supported in current IDEs (e.g., Eclipse), code smells detection scaffolding has still many limitations. Researchers argue that the subjectiveness of the code smells detection process is a major hindrance to mitigate the problem of smells-infected code.
Objective. This thesis presents a new approach to code smells detection that we have called CrowdSmelling and the results of a validation experiment for this approach. The latter is based on supervised machine learning techniques, where the wisdom of the crowd (of software developers) is used to collectively calibrate code smells detection algorithms, thereby lessening the subjectivity issue.
Method. In the context of three consecutive years of a Software Engineering course, a total “crowd” of around a hundred teams, with an average of three members each, classified the presence of 3 code smells (Long Method, God Class, and Feature Envy) in Java source code. These classifications were the basis of the oracles used for training six machine learning algorithms.
Over one hundred models were generated and evaluated to determine which machine learning algorithms had the best performance in detecting each of the aforementioned code smells.
Results. Good performances were obtained for God Class detection (ROC=0.896 for Naive Bayes) and Long Method detection (ROC=0.870 for AdaBoostM1), but much lower for Feature Envy (ROC=0.570 for Random Forrest).
Conclusions. Obtained results suggest that Crowdsmelling is a feasible approach for the detection of code smells, but further validation experiments are required to cover more code smells and to increase external validityContexto. Os cheiros de código são a principal causa de dívida técnica (technical debt), como tal, devem ser detectados e removidos. Os cheiros de código já foram há muito tempo catalogados juntamente com as correspondentes soluções mitigadoras chamadas operações de refabricação (refactoring). No entanto, embora estas últimas sejam suportadas nas IDEs
actuais (por exemplo, Eclipse), a deteção de cheiros de código têm ainda muitas limitações. Os investigadores argumentam que a subjectividade do processo de deteção de cheiros de código é um dos principais obstáculo à mitigação do problema da qualidade do código.
Objectivo. Esta tese apresenta uma nova abordagem à detecção de cheiros de código, a que chamámos CrowdSmelling, e os resultados de uma experiência de validação para esta abordagem. A nossa abordagem de CrowdSmelling baseia-se em técnicas de aprendizagem automática supervisionada, onde a sabedoria da multidão (dos programadores de software) é
utilizada para calibrar colectivamente algoritmos de detecção de cheiros de código, diminuindo assim a questão da subjectividade.
Método. Em três anos consecutivos, no âmbito da Unidade Curricular de Engenharia de Software, uma "multidão", num total de cerca de uma centena de equipas, com uma média de três membros cada, classificou a presença de 3 cheiros de código (Long Method, God Class, and Feature Envy) em código fonte Java. Estas classificações foram a base dos oráculos utilizados para o treino de seis algoritmos de aprendizagem automática. Mais de cem modelos foram gerados e avaliados para determinar quais os algoritmos de aprendizagem de máquinas com melhor desempenho na detecção de cada um dos cheiros de código acima mencionados.
Resultados. Foram obtidos bons desempenhos na detecção do God Class (ROC=0,896 para Naive Bayes) e na detecção do Long Method (ROC=0,870 para AdaBoostM1), mas muito mais baixos para Feature Envy (ROC=0,570 para Random Forrest).
Conclusões. Os resultados obtidos sugerem que o Crowdsmelling é uma abordagem viável para a detecção de cheiros de código, mas são necessárias mais experiências de validação para cobrir mais cheiros de código e para aumentar a validade externa
Pitfalls and Guidelines for Using Time-Based Git Data
Many software engineering research papers rely on time-based data (e.g., commit timestamps, issue report creation/update/close dates, release dates). Like most real-world data however, time-based data is often dirty. To date, there are no studies that quantify how frequently such data is used by the software engineering research community, or investigate sources of and quantify how often such data is dirty. Depending on the research task and method used, including such dirty data could aect the research results. This paper presents an extended survey of papers that utilize time-based data, published in the Mining Software Repositories (MSR) conference series. Out of the 754 technical track and data papers published in MSR 2004{2021, we saw at least 290 (38%) papers utilized time-based data. We also observed that most time-based data used in research papers comes in the form of Git commits, often from GitHub. Based on those results, we then used the Boa and Software Heritage infrastructures to help identify and quantify several sources of dirty Git timestamp data. Finally we provide guidelines/best practices for researchers utilizing time-based data from Git repositories
Mining app reviews to support software engineering
The thesis studies how mining app reviews can support software engineering.
App reviews —short user reviews of an app in app stores— provide a potentially rich source of information to help software development teams maintain and evolve their products. Exploiting this information is however difficult due to the large number of reviews and the difficulty in extracting useful actionable information from short informal texts.
A variety of app review mining techniques have been proposed to classify reviews and to extract information such as feature requests, bug descriptions, and user sentiments but the usefulness of these techniques in practice is still unknown. Research in this area has grown rapidly, resulting in a large number of scientific publications (at least 182 between 2010 and 2020) but nearly no independent evaluation and description of how diverse techniques fit together to support specific software engineering tasks have been performed so far.
The thesis presents a series of contributions to address these limitations. We first report the findings of a systematic literature review in app review mining exposing the breadth and limitations of research in this area. Using findings from the literature review, we then present a reference model that relates features of app review mining tools to specific software engineering tasks supporting requirements engineering, software maintenance and evolution.
We then present two additional contributions extending previous evaluations of app review mining techniques. We present a novel independent evaluation of opinion mining techniques using an annotated dataset created for our experiment. Our evaluation finds lower effectiveness than initially reported by the techniques authors. A final part of the thesis, evaluates approaches in searching for app reviews pertinent to a particular feature. The findings show a general purpose search technique is more effective than the state-of-the-art purpose-built app review mining techniques; and suggest their usefulness for requirements elicitation.
Overall, the thesis contributes to improving the empirical evaluation of app review mining techniques and their application in software engineering practice. Researchers and developers of future app mining tools will benefit from the novel reference model, detailed experiments designs, and publicly available datasets presented in the thesis
- …