670 research outputs found

    Harnessing deep learning algorithms to predict software refactoring

    Get PDF
    During software maintenance, software systems need to be modified by adding or modifying source code. These changes are required to fix errors or adopt new requirements raised by stakeholders or market place. Identifying thetargeted piece of code for refactoring purposes is considered a real challenge for software developers. The whole process of refactoring mainly relies on software developers’ skills and intuition. In this paper, a deep learning algorithm is used to develop a refactoring prediction model for highlighting the classes that require refactoring. More specifically, the gated recurrent unit algorithm is used with proposed pre-processing steps for refactoring predictionat the class level. The effectiveness of the proposed model is evaluated usinga very common dataset of 7 open source java projects. The experiments are conducted before and after balancing the dataset to investigate the influence of data sampling on the performance of the prediction model. The experimental analysis reveals a promising result in the field of code refactoring predictio

    Class-Level Refactoring Prediction by Ensemble Learning with Various Feature Selection Techniques

    Get PDF
    Background: Refactoring is changing a software system without affecting the software functionality. The current researchers aim i to identify the appropriate method(s) or class(s) that needs to be refactored in object-oriented software. Ensemble learning helps to reduce prediction errors by amalgamating different classifiers and their respective performances over the original feature data. Other motives are added in this paper regarding several ensemble learners, errors, sampling techniques, and feature selection techniques for refactoring prediction at the class level. Objective: This work aims to develop an ensemble-based refactoring prediction model with structural identification of source code metrics using different feature selection techniques and data sampling techniques to distribute the data uniformly. Our model finds the best classifier after achieving fewer errors during refactoring prediction at the class level. Methodology: At first, our proposed model extracts a total of 125 software metrics computed from object-oriented software systems processed for a robust multi-phased feature selection method encompassing Wilcoxon significant text, Pearson correlation test, and principal component analysis (PCA). The proposed multi-phased feature selection method retains the optimal features characterizing inheritance, size, coupling, cohesion, and complexity. After obtaining the optimal set of software metrics, a novel heterogeneous ensemble classifier is developed using techniques such as ANN-Gradient Descent, ANN-Levenberg Marquardt, ANN-GDX, ANN-Radial Basis Function; support vector machine with different kernel functions such as LSSVM-Linear, LSSVM-Polynomial, LSSVM-RBF, Decision Tree algorithm, Logistic Regression algorithm and extreme learning machine (ELM) model are used as the base classifier. In our paper, we have calculated four different errors i.e., Mean Absolute Error (MAE), Mean magnitude of Relative Error (MORE), Root Mean Square Error (RMSE), and Standard Error of Mean (SEM). Result: In our proposed model, the maximum voting ensemble (MVE) achieves better accuracy, recall, precision, and F-measure values (99.76, 99.93, 98.96, 98.44) as compared to the base trained ensemble (BTE) and it experiences less errors (MAE = 0.0057, MORE = 0.0701, RMSE = 0.0068, and SEM = 0.0107) during its implementation to develop the refactoring model. Conclusions: Our experimental result recommends that MVE with upsampling can be implemented to improve the performance of the refactoring prediction model at the class level. Furthermore, the performance of our model with different data sampling techniques and feature selection techniques has been shown in the form boxplot diagram of accuracy, F-measure, precision, recall, and area under the curve (AUC) parameters.publishedVersio

    Can feature requests reveal the refactoring types?

    Get PDF
    Software refactoring is the process of improving the design of a software system while preserving its external behavior. In recent years, refactoring research has been growing as a response to the degradation of software quality. Recent studies performed an in-depth investigation in (1) how refactoring practices are taking place during the software evolution, (2) how to recommend refactoring to improve the design of software, and (3) what type of refactoring operations can be implemented. However, there is a lack of support when it comes to developers’ typical programming tasks, including feature updates and bug fixes. The goal of this thesis is to investigate whether it is possible to support the developer through recommending appropriate refactoring types to be performed when the developer is assigned a given issue to handle. Our proposed solution will take as input the text of the issue along with the source code and tries to protect the appropriate refactoring type that would help in adapting efficiently the existing source code to the given feature request. To do so, we rely on the use of supervised learning. We start with collecting various issues that were handled using refactoring. This data will be used to train a model that will be able to predict the appropriate refactoring, given as input an Open issue description. We design a classification model that inputs a feature request and suggests a method-level refactoring. The classification model was trained with a total of 4008 feature request examples of four refactoring types. Our initial results show that this solution suffers from several challenges including the class imbalance: not all refactoring types are equally used to handle issues. Another challenge we detected is related to the description of the issue itself which typically does not explicitly mention any potential refactoring. Therefore, there will be a need for a large set of issues to be able to appropriately learn any patterns among them that would discriminate towards a given refactoring type

    Predicting code refactoring via analyzing the history of quality metrics and code anti-patterns

    Get PDF
    Code refactoring is the process of improving the internal structure of existing code without altering its functionality. Refactoring can help to reduce technical debt, enhance the quality of the code and make the code easy to evolve. However, the manual identification of the proper code refactoring operations to apply can be time-consuming and not scalable. In this thesis, we propose an approach based on data mining and machine learning techniques to analyze historical data and predict refactoring operations that may occur in a future release of a project. The approach uses a combination of techniques to identify patterns in the data and make predictions about which refactoring operations should be applied. In this study, we validated the proposed machine learning based approaches with 13 open-source projects with different releases. We identified the refactoring operations and code smells and extracted the quality metrics for each project release. We used the collected data (e.g. quality metrics and code smells) to predict refactoring operations, and we reported the prediction results based on cross- validation procedures. The proposed research contributes to the field of software quality by providing an efficient and effective approach to refactor the code. The findings of this research will also help developers by suggesting appropriate refactoring operations based on the history of the evolution of software projects. This will ultimately result in improved software quality, reduced technical debt, and enhanced software performance

    On the Documentation of Refactoring Types

    Get PDF
    Commit messages are the atomic level of software documentation. They provide a natural language description of the code change and its purpose. Messages are critical for software maintenance and program comprehension. Unlike documenting feature updates and bug fixes, little is known about how developers document their refactoring activities. Specifically, developers can perform multiple refactoring operations, including moving methods, extracting classes, renaming attributes, for various reasons, such as improving software quality, managing technical debt, and removing defects. Yet, there is no systematic study that analyzes the extent to which the documentation of refactoring accurately describes the refactoring operations performed at the source code level. Therefore, this paper challenges the ability of refactoring documentation, written in commit messages, to adequately predict the refactoring types, performed at the commit level. Our analysis relies on the text mining of commit messages to extract the corresponding features (i.e., keywords) that better represent each class (i.e., refactoring type). The extraction of text patterns, specific to each refactoring type (e.g., rename, extract, move, inline, etc.) allows the design of a model that verifies the consistency of these patterns with their corresponding refactoring. Such verification process can be achieved via automatically predicting, for a given commit, the method-level type of refactoring being applied, namely Extract Method, Inline Method, Move Method, Pull-up Method, Push-down Method, and Rename Method. We compared various classifiers, and a baseline keyword-based approach, in terms of their prediction performance, using a dataset of 5004 commits. Our main findings show that the complexity of refactoring type prediction varies from one type to another. Rename Method and Extract Method were found to be the best documented refactoring activities, while Pull-up Method, and Push-down Method were the hardest to be identified via textual descriptions. Such findings bring the attention of developers to the necessity of paying more attention to the documentation of these types

    Learning Effective Changes for Software Projects

    Full text link
    The primary motivation of much of software analytics is decision making. How to make these decisions? Should one make decisions based on lessons that arise from within a particular project? Or should one generate these decisions from across multiple projects? This work is an attempt to answer these questions. Our work was motivated by a realization that much of the current generation software analytics tools focus primarily on prediction. Indeed prediction is a useful task, but it is usually followed by "planning" about what actions need to be taken. This research seeks to address the planning task by seeking methods that support actionable analytics that offer clear guidance on what to do. Specifically, we propose XTREE and BELLTREE algorithms for generating a set of actionable plans within and across projects. Each of these plans, if followed will improve the quality of the software project.Comment: 4 pages, 2 figures. This a submission for ASE 2017 Doctoral Symposiu

    Towards the detection and analysis of performance regression introducing code changes

    Get PDF
    In contemporary software development, developers commonly conduct regression testing to ensure that code changes do not affect software quality. Performance regression testing is an emerging research area from the regression testing domain in software engineering. Performance regression testing aims to maintain the system\u27s performance. Conducting performance regression testing is known to be expensive. It is also complex, considering the increase of committed code and developing team members working simultaneously. Many automated regression testing techniques have been proposed in prior research. However, challenges in the practice of locating and resolving performance regression still exist. Directing regression testing to the commit level provides solutions to locate the root cause, yet it hinders the development process. This thesis outlines motivations and solutions to address locating performance regression root causes. First, we challenge a deterministic state-of-art approach by expanding the testing data to find improvement areas. The deterministic approach was found to be limited in searching for the best regression-locating rule. Thus, we presented two stochastic approaches to develop models that can learn from historical commits. The goal of the first stochastic approach is to view the research problem as a search-based optimization problem seeking to reach the highest detection rate. We are applying different multi-objective evolutionary algorithms and conducting a comparison between them. This thesis also investigates whether simplifying the search space by combining objectives would achieve comparative results. The second stochastic approach addresses the severity of class imbalance any system could have since code changes introducing regression are rare but costly. We formulate the identification of problematic commits that introduce performance regression as a binary classification problem that handles class imbalance. Further, the thesis provides an exploratory study on the challenges developers face in resolving performance regression. The study is based on the questions posted on a technical form directed to performance regression. We collected around 2k questions discussing the regression of software execution time, and all were manually analyzed. The study resulted in a categorization of the challenges. We also discussed the difficulty level of performance regression issues within the development community. This study provides insights to help developers during the software design and implementation to avoid regression causes

    Data Mining and Machine Learning for Software Engineering

    Get PDF
    Software engineering is one of the most utilizable research areas for data mining. Developers have attempted to improve software quality by mining and analyzing software data. In any phase of software development life cycle (SDLC), while huge amount of data is produced, some design, security, or software problems may occur. In the early phases of software development, analyzing software data helps to handle these problems and lead to more accurate and timely delivery of software projects. Various data mining and machine learning studies have been conducted to deal with software engineering tasks such as defect prediction, effort estimation, etc. This study shows the open issues and presents related solutions and recommendations in software engineering, applying data mining and machine learning techniques
    • …
    corecore