Search CORE

121 research outputs found

Class-Level Refactoring Prediction by Ensemble Learning with Various Feature Selection Techniques

Author: Kuanar Sanjay Kumar
Kumar Lov
Misra Sanjay
Panigrahi Rasmita
Publication venue: 'MDPI AG'
Publication date: 01/01/2022
Field of study

Background: Refactoring is changing a software system without affecting the software functionality. The current researchers aim i to identify the appropriate method(s) or class(s) that needs to be refactored in object-oriented software. Ensemble learning helps to reduce prediction errors by amalgamating different classifiers and their respective performances over the original feature data. Other motives are added in this paper regarding several ensemble learners, errors, sampling techniques, and feature selection techniques for refactoring prediction at the class level. Objective: This work aims to develop an ensemble-based refactoring prediction model with structural identification of source code metrics using different feature selection techniques and data sampling techniques to distribute the data uniformly. Our model finds the best classifier after achieving fewer errors during refactoring prediction at the class level. Methodology: At first, our proposed model extracts a total of 125 software metrics computed from object-oriented software systems processed for a robust multi-phased feature selection method encompassing Wilcoxon significant text, Pearson correlation test, and principal component analysis (PCA). The proposed multi-phased feature selection method retains the optimal features characterizing inheritance, size, coupling, cohesion, and complexity. After obtaining the optimal set of software metrics, a novel heterogeneous ensemble classifier is developed using techniques such as ANN-Gradient Descent, ANN-Levenberg Marquardt, ANN-GDX, ANN-Radial Basis Function; support vector machine with different kernel functions such as LSSVM-Linear, LSSVM-Polynomial, LSSVM-RBF, Decision Tree algorithm, Logistic Regression algorithm and extreme learning machine (ELM) model are used as the base classifier. In our paper, we have calculated four different errors i.e., Mean Absolute Error (MAE), Mean magnitude of Relative Error (MORE), Root Mean Square Error (RMSE), and Standard Error of Mean (SEM). Result: In our proposed model, the maximum voting ensemble (MVE) achieves better accuracy, recall, precision, and F-measure values (99.76, 99.93, 98.96, 98.44) as compared to the base trained ensemble (BTE) and it experiences less errors (MAE = 0.0057, MORE = 0.0701, RMSE = 0.0068, and SEM = 0.0107) during its implementation to develop the refactoring model. Conclusions: Our experimental result recommends that MVE with upsampling can be implemented to improve the performance of the refactoring prediction model at the class level. Furthermore, the performance of our model with different data sampling techniques and feature selection techniques has been shown in the form boxplot diagram of accuracy, F-measure, precision, recall, and area under the curve (AUC) parameters.publishedVersio

Directory of Open Access Journals

HIØ Brage

A novel approach for code smell detection : an empirical study

Author: Dewangan Seema
Gupta Manjari
Mishra Alok
Rao Rajwant Singh
Publication venue
Publication date: 01/01/2021
Field of study

Code smells detection helps in improving understandability and maintainability of software while reducing the chances of system failure. In this study, six machine learning algorithms have been applied to predict code smells. For this purpose, four code smell datasets (God-class, Data-class, Feature-envy, and Long-method) are considered which are generated from 74 open-source systems. To evaluate the performance of machine learning algorithms on these code smell datasets, 10-fold cross validation technique is applied that predicts the model by partitioning the original dataset into a training set to train the model and test set to evaluate it. Two feature selection techniques are applied to enhance our prediction accuracy. The Chi-squared and Wrapper-based feature selection techniques are used to improve the accuracy of total six machine learning methods by choosing the top metrics in each dataset. Results obtained by applying these two feature selection techniques are compared. To improve the accuracy of these algorithms, grid search-based parameter optimization technique is applied. In this study, 100% accuracy was obtained for the Long-method dataset by using the Logistic Regression algorithm with all features while the worst performance 95.20 % was obtained by Naive Bayes algorithm for the Long-method dataset using the chi-square feature selection technique.publishedVersio

Brage HiM

Predicting Software Fault Proneness Using Machine Learning

Author: Ghanathey Sanjay
Publication venue: Scholarship@Western
Publication date: 19/12/2018
Field of study

Context: Continuous Integration (CI) is a DevOps technique which is widely used in practice. Studies show that its adoption rates will increase even further. At the same time, it is argued that maintaining product quality requires extensive and time consuming, testing and code reviews. In this context, if not done properly, shorter sprint cycles and agile practices entail higher risk for the quality of the product. It has been reported in literature [68], that lack of proper test strategies, poor test quality and team dependencies are some of the major challenges encountered in continuous integration and deployment. Objective: The objective of this thesis, is to bridge the process discontinuity that exists between development teams and testing teams, due to continuous deployments and shorter sprint cycles, by providing a list of potentially buggy or high risk files, which can be used by testers to prioritize code inspection and testing, reducing thus the time between development and release. Approach: Out approach is based on a five step process. The first step is to select a set of systems, a set of code metrics, a set of repository metrics, and a set of machine learning techniques to consider for training and evaluation purposes. The second step is to devise appropriate client programs to extract and denote information obtained from GitHub repositories and source code analyzers. The third step is to use this information to train the models using the selected machine learning techniques. This step allowed to identify the best performing machine learning techniques out of the initially selected in the first step. The fourth step is to apply the models with a voting classifier (with equal weights) and provide answers to five research questions pertaining to the prediction capability and generality of the obtained fault proneness prediction framework. The fifth step is to select the best performing predictors and apply it to two systems written in a completely different language (C++) in order to evaluate the performance of the predictors in a new environment. Obtained Results: The obtained results indicate that a) The best models were the ones applied on the same system as the one trained on; b) The models trained using repository metrics outperformed the ones trained using code metrics; c) The models trained using code metrics were proven not adequate for predicting fault prone modules; d) The use of machine learning as a tool for building fault-proneness prediction models is promising, but still there is work to be done as the models show weak to moderate prediction capability. Conclusion: This thesis provides insights into how machine learning can be used to predict whether a source code file contains one or more faults that may contribute to a major system failure. The proposed approach is utilizing information extracted both from the system’s source code, such as code metrics, and from a series of DevOps tools, such as bug repositories, version control systems and, testing automation frameworks. The study involved five Java and five Python systems and indicated that machine learning techniques have potential towards building models for alerting developers about failure prone code

Scholarship@Western

A Study on Architectural Smells Prediction

Author: Arcelli Fontana Francesca
Avgeriou Paris
Pigazzini Ilaria
Roveda Riccardo
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/08/2019
Field of study

Architectural smells can be detrimental to the system maintainability, evolvability and represent a source of architectural debt. Thus, it is very important to be able to understand how they evolved in the past and to predict their future evolution. In this paper, we evaluate if the existence of architectural smells in the past versions of a project can be used to predict their presence in the future. We analyzed four Java projects in 295 Github releases and we applied for the prediction four different supervised learning models in a repeated cross-validation setting. We found that historical architectural smell information can be used to predict the presence of architectural smells in the future. Hence, practitioners should carefully monitor the evolution of architectural smells and take preventative actions to avoid introducing them and stave off their progressive growth.</p

Proceedings - University of Groningen

Crossref

University of Groningen

ARTS repository - University of Groningen

Dissertations of the University of Groningen

Detection of code smells using machine learning techniques combined with data-balancing methods

Author: Khleel Nasraldeen Alnor Adam
Nehéz Károly
Publication venue: Universitas Ahmad Dahlan
Publication date: 01/11/2023
Field of study

Code smells are prevalent issues in software design that arise when implementation or design principles are violated. These issues manifest as symptoms or anomalies in the source code. Timely identification of code smells plays a crucial role in enhancing software quality and facilitating software maintenance. Previous studies have shown that code smell detection can be accomplished through the utilization of machine learning (ML) methods. However, despite their increasing popularity, research suggests that the suitability of these methods are not always appropriate due to the problem of imbalanced data. Consequently, the effectiveness of ML models may be negatively affected. This study aims to propose a novel method for detecting code smells by employing five ML algorithms, namely decision tree (DT), k-nearest neighbors (K-NN), support vector machine (SVM), XGboost (XGB), and multi-layer perceptron (MLP). Additionally, to tackle the challenge of imbalanced data, the proposed method incorporates the random oversampling technique. Experiments were conducted in this study using four datasets that encompassed code smells, specifically god-class, data-class, long-method, and feature-envy. The experimental outcomes were evaluated and compared using various performance metrics. Upon comparing the outcomes of our models on both the balanced and original datasets, we found that the XGB model achieved the highest accuracy of 100% for detecting the data class and long method on the original datasets. In contrast, the highest accuracy of 100% was obtained for the data class and long method using DT, SVM, and XGB models on the balanced datasets. According to the empirical findings, there is significant promise in using ML techniques for the accurate prediction of code smells

International Journal of Advances in Intelligent Informatics

International Journal of Advances in Intelligent Informatics (IJAIN)

Harnessing deep learning algorithms to predict software refactoring

Author: Akour Mohammed
Al Qasem Osama
Alenezi Mamdouh
Publication venue: 'Universitas Ahmad Dahlan'
Publication date: 01/12/2020
Field of study

During software maintenance, software systems need to be modified by adding or modifying source code. These changes are required to fix errors or adopt new requirements raised by stakeholders or market place. Identifying thetargeted piece of code for refactoring purposes is considered a real challenge for software developers. The whole process of refactoring mainly relies on software developers’ skills and intuition. In this paper, a deep learning algorithm is used to develop a refactoring prediction model for highlighting the classes that require refactoring. More specifically, the gated recurrent unit algorithm is used with proposed pre-processing steps for refactoring predictionat the class level. The effectiveness of the proposed model is evaluated usinga very common dataset of 7 open source java projects. The experiments are conducted before and after balancing the dataset to investigate the influence of data sampling on the performance of the prediction model. The experimental analysis reveals a promising result in the field of code refactoring predictio

Journal of Education and Learning (EduLearn)

TELKOMNIKA (Telecommunication Computing Electronics and Control)