8 research outputs found
A large-scale empirical exploration on refactoring activities in open source software projects
Refactoring is a well-established practice that aims at improving the internal structure of a software system without changing its external behavior. Existing literature provides evidence of how and why developers perform refactoring in practice. In this paper, we continue on this line of research by performing a large-scale empirical analysis of refactoring practices in 200 open source systems. Specifically, we analyze the change history of these systems at commit level to investigate: (i) whether developers perform refactoring operations and, if so, which are more diffused and (ii) when refactoring operations are applied, and (iii) which are the main developer-oriented factors leading to refactoring. Based on our results, future research can focus on enabling automatic support for less frequent refactorings and on recommending refactorings based on the developer's workload, project's maturity and developer's commitment to the project
Code Review Practices for Refactoring Changes: An Empirical Study on OpenStack
Modern code review is a widely used technique employed in both industrial and open-source projects to improve software quality, share knowledge, and ensure adherence to coding standards and guidelines. During code review, developers may discuss refactoring activities before merging code changes in the code base. To date, code review has been extensively studied to explore its general challenges, best practices and outcomes, and socio-technical aspects. However, little is known about how refactoring is being reviewed and what developers care about when they review refactored code. Hence, in this work, we present a quantitative and qualitative study to understand what are the main criteria developers rely on to develop a decision about accepting or rejecting a submitted refactored code, and what makes this process challenging. Through a case study of 11,010 refactoring and non-refactoring reviews spread across OpenStack open-source projects, we find that refactoring-related code reviews take significantly longer to be resolved in terms of code review efforts. Moreover, upon performing a thematic analysis on a significant sample of the refactoring code review discussions, we built a comprehensive taxonomy consisting of 28 refactoring review criteria. We envision our findings reaffirming the necessity of developing accurate and efficient tools and techniques that can assist developers in the review process in the presence of refactorings
Towards using fluctuations in internal quality metrics to find design intents
Le contrôle de version est la pierre angulaire des processus de développement de logiciels modernes. Tout en
construisant des logiciels de plus en plus complexes, les développeurs doivent comprendre des sous-systèmes de code
source qui leur sont peu familier. Alors que la compréhension de la logique d'un code étranger est relativement simple,
la compréhension de sa conception et de sa genèse est plus compliquée. Elle n'est souvent possible que par les
descriptions des révisions et de la documentation du projet qui sont dispersées et peu fiables -- quand elles existent.
Ainsi, les développeurs ont besoin d'une base de référence fiable et pertinente pour comprendre l'historique des projets
logiciels. Dans cette thèse, nous faisons les premiers pas vers la compréhension des motifs de changement dans les
historiques de révision. Nous étudions les changements prenant place dans les métriques logicielles durant l'évolution
d'un projet.
Au travers de multiples études exploratoires, nous réalisons des expériences quantitatives et qualitatives sur plusieurs
jeux de données extraits à partir d'un ensemble de 13 projets. Nous extrayons les changements dans les métriques
logicielles de chaque commit et construisons un jeu de donnée annoté manuellement comme vérité de base.
Nous avons identifié plusieurs catégories en analysant ces changements. Un motif en particulier nommé "compromis", dans
lequel certaines métriques peuvent s'améliorer au détriment d'autres, s'est avéré être un indicateur prometteur de
changements liés à la conception -- dans certains cas, il laisse également entrevoir une intention de conception
consciente de la part des auteurs des changements. Pour démontrer les observations de nos études exploratoires, nous
construisons un modèle général pour identifier l'application d'un ensemble bien connu de principes de conception dans de
nouveaux projets.
Nos résultats suggèrent que les fluctuations de métriques ont le potentiel d'être des indicateurs pertinents pour gagner
des aperçus macroscopiques sur l'évolution de la conception dans l'historique de développement d'un projet.Version control is the backbone of the modern software development workflow. While building more and more complex
systems, developers have to understand unfamiliar subsystems of source code. Understanding the logic of unfamiliar code
is relatively straightforward. However, understanding its design and its genesis is often only possible through
scattered and unreliable commit messages and project documentation -- when they exist.
Thus, developers need a reliable and relevant baseline to understand the history of software projects. In this thesis,
we take the first steps towards understanding change patterns in commit histories. We study the changes in software
metrics through the evolution of projects.
Through multiple exploratory studies, we conduct quantitative and qualitative experiments on several datasets extracted
from a pool of 13 projects. We mine the changes in software metrics for each commit of the respective projects and
manually build oracles to represent ground truth.
We identified several categories by analyzing these changes. One pattern, in particular, dubbed "tradeoffs", where some
metrics may improve at the expense of others, proved to be a promising indicator of design-related changes -- in some
cases, also hinting at a conscious design intent from the authors of the changes. Demonstrating the findings of our
exploratory studies, we build a general model to identify the application of a well-known set of design principles in
new projects.
Our overall results suggest that metric fluctuations have the potential to be relevant indicators for valuable
macroscopic insights about the design evolution in a project's development history
Understanding the impact of introducing Lambda expressions in Java Programs
Background: The Java programming language version eight introduced several features that encourage the func
tional style of programming, including the support for lambda expressions and the Stream API. Currently, there is
a common wisdom that refactoring legacy code to introduce lambda expressions, besides other potential benefits,
simplifies the code and improves program comprehension. Aims: The purpose of this work is to investigate this
belief, conducting an indepth study to evaluate the effect of introducing lambda expressions on program comprehension. Method: We conducted this research using a mixedmethod approach. For the quantitative method, we
quantitatively analyzed 158 pairs of code snippets extracted directly either from GitHub or from recommendations
from three tools (RJTL, NetBeans, and IntelliJ). We also surveyed practitioners to collect their perceptions about the
benefits on program comprehension when introducing lambda expressions. We asked practitioners to evaluate and
rate sets of pairs of code snippets. Results: We found contradictory results in our research. Based on the quantitative
assessment, we could not find evidence that the introduction of lambda expressions improves software readability—
one of the components of program comprehension. Our results suggest that the transformations recommended by
the aforementioned tools decrease program comprehension when assessed by two stateoftheart models to estimate readability. Differently, our findings of the qualitative assessment suggest that the introduction of lambda
expression improves program comprehension in three scenarios when: we convert anonymous inner classes to a
lambda expression, use structural loops with inner conditional to an anyMatch operator, and apply structural loops
to filter operator combined with a collect method. Implications: We argue in this paper that one can improve
program comprehension when he/she applies particular transformations to introduce lambda expressions (e.g., replacing anonymous inner classes with lambda expressions). Also, the opinion of the participants highlights which
kind of transformation for introducing lambda might be advantageous. This might support the implementation of
effective tools for automatic program transformations
Behind the Intent of Extract Method Refactoring: A Systematic Literature Review
Code refactoring is widely recognized as an essential software engineering
practice to improve the understandability and maintainability of the source
code. The Extract Method refactoring is considered as "Swiss army knife" of
refactorings, as developers often apply it to improve their code quality. In
recent years, several studies attempted to recommend Extract Method
refactorings allowing the collection, analysis, and revelation of actionable
data-driven insights about refactoring practices within software projects. In
this paper, we aim at reviewing the current body of knowledge on existing
Extract Method refactoring research and explore their limitations and potential
improvement opportunities for future research efforts. Hence, researchers and
practitioners begin to be aware of the state-of-the-art and identify new
research opportunities in this context. We review the body of knowledge related
to Extract Method refactoring in the form of a systematic literature review
(SLR). After compiling an initial pool of 1,367 papers, we conducted a
systematic selection and our final pool included 83 primary studies. We define
three sets of research questions and systematically develop and refine a
classification schema based on several criteria including their methodology,
applicability, and degree of automation. The results construct a catalog of 83
Extract Method approaches indicating that several techniques have been proposed
in the literature. Our results show that: (i) 38.6% of Extract Method
refactoring studies primarily focus on addressing code clones; (ii) Several of
the Extract Method tools incorporate the developer's involvement in the
decision-making process when applying the method extraction, and (iii) the
existing benchmarks are heterogeneous and do not contain the same type of
information, making standardizing them for the purpose of benchmarking
difficult
A User-aware Intelligent Refactoring for Discrete and Continuous Software Integration
Successful software products evolve through a process of continual change. However, this process may weaken the design of the software and make it unnecessarily complex, leading to significantly reduced productivity and increased fault-proneness. Refactoring improves the software design while preserving overall functionality and behavior, and is an important technique in managing the growing complexity of software systems. Most of the existing work on software refactoring uses either an entirely manual or a fully automated approach. Manual refactoring is time-consuming, error-prone and unsuitable for large-scale, radical refactoring. Furthermore, fully automated refactoring yields a static list of refactorings which, when applied, leads to a new and often hard to comprehend design. In addition, it is challenging to merge these refactorings with other changes performed in parallel by developers. In this thesis, we propose a refactoring recommendation approach that dynamically adapts and interactively suggests refactorings to developers and takes their feedback into consideration. Our approach uses Non-dominated Sorting Genetic Algorithm (NSGAII) to find a set of good refactoring solutions that improve software
quality while minimizing the deviation from the initial design. These refactoring solutions are then analyzed to extract interesting common features between them such as the frequently occurring refactorings in the best non-dominated solutions. We combined our interactive approach and unsupervised learning to reduce the developer’s interaction effort when refactoring a system. The unsupervised learning algorithm clusters the different trade-off solutions, called the Pareto front, to guide the developers in selecting their region of interests and reduce the number of refactoring options to explore. To reduce the interaction effort, we propose an approach to convert multi-objective search into a mono-objective one after interacting with the developer to identify a good refactoring solution based on their preferences. Since developers may want to focus on specific code locations, the ”Decision Space” is also important. Therefore, our interactive approach enables developers to pinpoint their preference simultaneously in the objective (quality metrics) and decision (code location) spaces. Due to an urgent need for refactoring tools that can support continuous integration and some recent development processes such as DevOps that are based on rapid releases, we propose, for the first time, an intelligent software refactoring bot, called RefBot. Our bot continuously monitors the software repository and find the best sequence of refactorings to fix the quality issues in Continous Integration/Continous Development (CI/CD) environments as a set of pull-requests generated after mining previous code changes to understand the profile of developers.
We quantitatively and qualitatively evaluated the performance and effectiveness of our proposed approaches via a set of studies conducted with experienced developers who used our tools on both open source and industry projects.Ph.D.College of Engineering & Computer ScienceUniversity of Michigan-Dearbornhttps://deepblue.lib.umich.edu/bitstream/2027.42/154775/1/Vahid Alizadeh Final Dissertation.pdfDescription of Vahid Alizadeh Final Dissertation.pdf : Dissertatio
An approach to safely evolve preprocessor-based C program families.
Desde os anos 70, o pré-processador C é amplamente utilizado na prática para adaptar sistemas para diferentes plataformas e cenários de aplicação. Na academia, no entanto, o pré-processador tem recebido fortes críticas desde o início dos anos 90. Os pesquisadores têm criticado a sua falta de modularidade, a sua propensão para introduzir erros sutis e sua ofuscação do código fonte. Para entender melhor os problemas de usar o pré-processador C,considerando a percepção dos desenvolvedores, realizamos 40 entrevistas e uma pesquisa entre 202 desenvolvedores. Descobrimos que os desenvolvedores lidam com três problemas comuns na prática: erros relacionados à configuração, testes combinatórios e compreensão do código. Os desenvolvedores agravam estes problemas ao usar diretivas não disciplinadas, as quais não respeitam a estrutura sintática do código. Para evoluir famílias de programas de forma segura, foram propostas duas estratégias para a detecção de erros relacionados à configuração e um conjunto de 14 refatoramentos para remover diretivas não disciplinadas. Para lidar melhor com a grande quantidade de configurações do código fonte, a primeira estratégia considera todo o conjunto de configurações do código fonte e a segunda estratégia utiliza amostragem. Para propor um algoritmo de amostragem adequado, foram comparados 10 algoritmos com relação ao esforço (número de configurações para testar) e capacidade de detecção de erros (número de erros detectados nas configurações da amostra). Com base nos resultados deste estudo, foi proposto um algoritmo de amostragem. Estudos empíricos foram realizados usando 40 sistemas C do mundo real. Detectamos 128 erros relacionados à configuração, enviamos 43 correções para erros ainda não corrigidos e os desenvolvedores aceitaram 65% das correções. Os resultados de nossa pesquisa mostram que a maioria dos desenvolvedores preferem usar a versão refatorada,ou seja,disciplinada do código fonte,ao invés do código original com as diretivas não disciplinadas. Além disso,os desenvolvedores aceitaram 21 (75%) das 28 sugestões enviadas para transformar diretivas não disciplinadas em disciplinadas. Nossa pesquisa apresenta resultados úteis para desenvolvedores de código C durante suas tarefas de desenvolvimento, contribuindo para minimizar o número de erros relacionados à configuração, melhorar a compreensão e a manutenção do código fonte e orientar os desenvolvedores para realizar testes combinatórios.Since the 70s, the C preprocessor is still widely used in practice in a numbers of projects, including Apache,Linux ,and Libssh, totail or systems to different platforms and application scenarios. In academia,however, the preprocess or has received strong critic is msinceatl east the early 90s. Researchers have criticized its lack of separation of concerns, its proneness to introduce subtle errors, and its obfuscation of the source code. To better understand the problems of using the C preprocessor, taking the perception of developers into account, we conducted 40 interviewsandasurveyamong 202 developers. We found that developers deal with three common problems in practice: configuration-related bugs, combinatorial testing, and code comprehension. Developers aggravate these problems when using undisciplined directives (i.e., bad smells regarding preprocessor use), which are preprocessor directives thatdo notrespect thesyntactic structureof thesource code. To safely evolve preprocessor based program families, we proposed strategies to detect configuration-relatedbugs and bad smells, and a set of 14 refactorings to remove bad smells. To better deal with exponential configuration spaces, our strategies uses variability-aware analysis that considers the entire set of possible configurations, and sampling, which allows to reuse C tools that consider only one configuration at a time to detect bugs. To propose a suitable sampling algorithm, we compared 10 algorithms with respect to effort (i.e., number of configurations to test) andbug-detection capabilities (i.e.,numberofbugs detected in the sampled configurations). Based on the results, we proposed a sampling algorithm with an useful balance between effort and bug-detection capability. We performed empirical studies using a corpus of 40 C real-world systems. We detected 128 configuration-related bugs, submitted 43 patches to fix bugs not fixed yet, and developers accepted 65% of the patches. The results of our survey show that most developers prefer to use the refactored (i.e., disciplined) version of the code instead of the original code with undisciplined directives. Furthermore, developers accepted 21 (75%) out of 28 patches submitted to refactor undisciplined into disciplined directives. Our work presents useful findings for C developers during their development tasks, contributing to minimize the chances of introducing configuration-related bugs and bad smells, improve code comprehension, and guide developers to perform combinatorial testing