14 research outputs found

    Predicting code refactoring via analyzing the history of quality metrics and code anti-patterns

    Get PDF
    Code refactoring is the process of improving the internal structure of existing code without altering its functionality. Refactoring can help to reduce technical debt, enhance the quality of the code and make the code easy to evolve. However, the manual identification of the proper code refactoring operations to apply can be time-consuming and not scalable. In this thesis, we propose an approach based on data mining and machine learning techniques to analyze historical data and predict refactoring operations that may occur in a future release of a project. The approach uses a combination of techniques to identify patterns in the data and make predictions about which refactoring operations should be applied. In this study, we validated the proposed machine learning based approaches with 13 open-source projects with different releases. We identified the refactoring operations and code smells and extracted the quality metrics for each project release. We used the collected data (e.g. quality metrics and code smells) to predict refactoring operations, and we reported the prediction results based on cross- validation procedures. The proposed research contributes to the field of software quality by providing an efficient and effective approach to refactor the code. The findings of this research will also help developers by suggesting appropriate refactoring operations based on the history of the evolution of software projects. This will ultimately result in improved software quality, reduced technical debt, and enhanced software performance

    On the Impact of Refactoring on the Relationship between Quality Attributes and Design Metrics

    Get PDF
    Refactoring is a critical task in software maintenance and is generally performed to enforce the best design and implementation practices or to cope with design defects. Several studies attempted to detect refactoring activities through mining software repositories allowing to collect, analyze and get actionable data-driven insights about refactoring practices within software projects. Aim: We aim at identifying, among the various quality models presented in the literature, the ones that are more in-line with the developer’s vision of quality optimization, when they explicitly mention that they are refactoring to improve them. Method: We extract a large corpus of design-related refactoring activities that are applied and documented by developers during their daily changes from 3,795 curated open source Java projects. In particular, we extract a large-scale corpus of structural metrics and anti-pattern enhancement changes, from which we identify 1,245 quality improvement commits with their corresponding refactoring operations, as perceived by software engineers. Thereafter, we empirically analyze the impact of these refactoring operations on a set of common state-of-the-art design quality metrics. Results: The statistical analysis of the obtained results shows that (i) a few state-of-the-art metrics are more popular than others; and (ii) some metrics are being more emphasized than others. Conclusions: We verify that there are a variety of structural metrics that can represent the internal quality attributes with different degrees of improvement and degradation of software quality. Most of the metrics that are mapped to the main quality attributes do capture developer intentions of quality improvement reported in the commit messages, but for some quality attributes, they don’t

    Improving the Accuracy of Refactoring Detection

    Get PDF
    With the development of refactoring technology, refactoring detection technology as its reverse technology has also been greatly progressed and applied, the technology has important significance and role for code optimisation, code review and code reliability. Over the past 20 years, the refactoring detection technique has evolved from a theoretical concept to be mature approaches and tools. However, due to the various complexities that arise when refactoring code, there are still some problems with these detection tools at work: selection of tools, detection of nested refactorings, false negatives due to matching algorithms, etc. As the requirements for detection increase, the pursuit of better detection performance (precision and recall) and more generalised detection tools has become a major research goal for my PhD. The main research components of this paper include: Firstly, at the beginning of my research I conducted a meta-analysis of refactoring detection and evaluated the detection performance of four common refactoring detection tools under the same benchmark, analysed and compared their strengths and weaknesses, and identified new research questions and research directions. Secondly, I identified the study and detection of nested refactorings as a research blind spot in existing approaches, so I conducted a demonstration of the feasibility of nesting multiple refactor types with each other; in addition, I created an approach that can detect nested refactorings based on a single refactoring data using manually defined refactoring features combined with a random forest algorithm, thus being able to detect all 35 semantically meaningful nestings of them with 91.4% accuracy. Then I focused on the features that emerged during the refactoring process, mining the refactoring information in the diff to help RefDiff improve detection performance. During the research I developed Diff Extractor and Diff Encoder for extracting and encoding diffs, transformed diffs into arrays for refactoring information mining, and trained two models: 1. Diff Structure Feature Model, which determines the type of refactoring based on the structural features of the refactored diffs and can be used as a result checker, which improves the overall performance of RefDiff by checking for false positives in the RefDiff detection results. 2. Diff Feature Matching Network, which is trained based on the correspondence of the removed and added parts of the refactored diff, has excellent robustness and can solve the problem of missing matching caused by word frequency matching approach. Finally, I design an approach that integrates the two models, optimises Diff Extractor and Diff Encoder based on the characteristics of diff features, adds flags to diff based on the refactoring property, designs new encoding approach emphasising token uniqueness, trains better models and builds a model cross-validation mechanism that allows us to obtain detection results with high levels of confidence. We have shown that our approach, called RefDiff-Model, not only improves the precision of RefDiff 2.0 to 100% and increases the recall to 96.1%, but also continues to support detection tasks in multiple programming languages

    Explainable Search-Based Refactoring

    Full text link
    http://deepblue.lib.umich.edu/bitstream/2027.42/170141/1/TSE_Explainability__Copy_ (1).pdfSEL

    On the Documentation of Refactoring Types

    Get PDF
    Commit messages are the atomic level of software documentation. They provide a natural language description of the code change and its purpose. Messages are critical for software maintenance and program comprehension. Unlike documenting feature updates and bug fixes, little is known about how developers document their refactoring activities. Specifically, developers can perform multiple refactoring operations, including moving methods, extracting classes, renaming attributes, for various reasons, such as improving software quality, managing technical debt, and removing defects. Yet, there is no systematic study that analyzes the extent to which the documentation of refactoring accurately describes the refactoring operations performed at the source code level. Therefore, this paper challenges the ability of refactoring documentation, written in commit messages, to adequately predict the refactoring types, performed at the commit level. Our analysis relies on the text mining of commit messages to extract the corresponding features (i.e., keywords) that better represent each class (i.e., refactoring type). The extraction of text patterns, specific to each refactoring type (e.g., rename, extract, move, inline, etc.) allows the design of a model that verifies the consistency of these patterns with their corresponding refactoring. Such verification process can be achieved via automatically predicting, for a given commit, the method-level type of refactoring being applied, namely Extract Method, Inline Method, Move Method, Pull-up Method, Push-down Method, and Rename Method. We compared various classifiers, and a baseline keyword-based approach, in terms of their prediction performance, using a dataset of 5004 commits. Our main findings show that the complexity of refactoring type prediction varies from one type to another. Rename Method and Extract Method were found to be the best documented refactoring activities, while Pull-up Method, and Push-down Method were the hardest to be identified via textual descriptions. Such findings bring the attention of developers to the necessity of paying more attention to the documentation of these types

    An interview study about the use of logs in embedded software engineering

    Get PDF
    Context: Execution logs capture the run-time behavior of software systems. To assist developers in their maintenance tasks, many studies have proposed tools to analyze execution information from logs. However, it is as yet unknown how industry developers use logs in embedded software engineering. Objective: In this study, we aim to understand how developers use logs in an embedded software engineering context. Specifically, we would like to gain insights into the type of logs developers analyze, the purposes for which developers analyze logs, the information developers need from logs and their expectation on tool support. Method: In order to achieve the aim, we conducted these interview studies. First, we interviewed 25 software developers from ASML, which is a leading company in developing lithography machines. This exploratory case study provides the preliminary findings. Next, we validated and refined our findings by conducting a replication study. We involved 14 interviewees from four companies who have different software engineering roles in their daily work. Results: As the result of our first study, we compile a preliminary taxonomy which consists of four types of logs used by developers in practice, 18 purposes of using logs, 13 types of information developers search in logs, 13 challenges faced by developers in log analysis and three suggestions for tool support provided by developers. This taxonomy is refined in the replication study with three additional purposes, one additional information need, four additional challenges and three additional suggestions of tool support. In addition, with these two studies, we observed that text-based editors and self-made scripts are commonly used when it comes to tooling in log analysis practice. As indicated by the interviewees, the development of automatic analysis tools is hindered by the quality of the logs, which further suggests several challenges in log instrumentation and management. Conclusions: Based on our study, we provide suggestions for practitioners on logging practices. We provide implications for tool builders on how to further improve tools based on existing techniques. Finally, we suggest some research directions and studies for researchers to further study software logging.</p

    FREQUENTLY REFACTORED CODE IDIOMS

    Get PDF
    It is important to refactor software source code from time to time to preserve its maintainability and understandability. Despite this, software developers rarely dedicate time for refactoring due to deadline pressure or resource limitations. To help developers take advantage of refactoring opportunities, researchers have been working towards automated refactoring recommendation systems for a long time. However, these techniques suffer from low precision, as many of their recommendations are irrelevant to the developers. As a result, most developers do not rely on these systems and prefer to perform refactoring based on their own experience and intuition. To shed more light on the practice of refactoring, we investigate the characteristics of the code fragments that get refactored most frequently across software repositories. Finding the most repeatedly refactored fragments can be very challenging due to the following factors: i) Refactorings are highly influenced by the context of the code. Therefore, it is difficult to remove context-specific information and find similarities. ii) Refactorings are usually more complex than simple code edits such as line additions or deletions. Therefore, basic source code matching techniques, such as token sequence matching do not produce good results. iii) A higher level of abstraction is required to match refactored code fragments within projects of a different domain. At the same time, the structural detail of the code must be preserved to find accurate results. In this study, we tried to overcome the above-mentioned challenges and explore several ways to compare refactored code from different contexts. We transformed the refactored code to different source code representations and attempted to find non-trivial refactored code idioms, which cannot be identified using the standard code comparison techniques. In this thesis, we are presenting the findings of our study, in which we examine the repetitiveness of refactorings on a large java codebase of 1,025 projects. We discuss how we analyze our dataset of 47,647 refactored code fragments using a combination of state-of-art code matching techniques. Finally, we report the most common refactoring actions and discuss the motivations behind those

    An Improved Approach for Extracting Frequently Extracted Code Idioms

    Get PDF
    Source code refactoring is a process of restructuring or changing the existing codes without changing their external behaviour. This is a continuous process done by the developers to improve code quality, readability, maintainability of the source code, and address technical debt. There have been studies and tools to aid developers to refactor effectively their source code and to understand the motivations behind refactorings applied by developers. We aim to find Code Idioms that developers tend to refactor more frequently and investigate whether there are unique refactored code idioms for production code and test code. We use the RefactoringMiner tool to detect and collect EXTRACT METHOD refactoring from the commit history of the projects and propose a technique to represent the code fragments as structure-preserving context-free independent graphs and apply graph-similarity measure techniques to find similar code idioms among 65,742 EXTRACT METHOD instances. We measure both exact matching and partial matching with constraint checking from the associated metadata of the nodes and edges of the graphs. We divide our data set into production code and test code and found a total of 489 code idiom patterns. We present in detail 22 of the most frequently refactored code idioms. There are unique patterns to production code and test code and patterns shared among them. We limit our study to only Java-based open-source projects and EXTRACT METHOD refactoring, but we believe the approach can be applied to other object-oriented languages or refactorings. The findings can be useful to design an effective refactoring recommender system, help developers gain confidence in refactoring recommendation tools, and help researchers understand refactoring motivations and API usage patterns

    Towards the Automation of Migration and Safety of Third-Party Libraries

    Get PDF
    The process of migration from one library to a new, different library is very complex. Typically, the developer needs to find functions in the new library that are most adequate in replacing the functions of the retired library. This process is subjective and time-consuming as the developer needs to fully understand the documentation of both libraries to be able to migrate from an old library to a new one and find the right matching function(s) if exists. Our goal is helping the developer to have better experiences with library migration by identifying the key problems related to this process. Based on our critical literature review, we identified three main challenges related to the automation of library migration: (1) the mining of existing migrations, (2) learning from these migrations to recommend them in similar contexts, and (3) guaranteeing the safety of the recommended migrations
    corecore