728 research outputs found

    Class-Level Refactoring Prediction by Ensemble Learning with Various Feature Selection Techniques

    Get PDF
    Background: Refactoring is changing a software system without affecting the software functionality. The current researchers aim i to identify the appropriate method(s) or class(s) that needs to be refactored in object-oriented software. Ensemble learning helps to reduce prediction errors by amalgamating different classifiers and their respective performances over the original feature data. Other motives are added in this paper regarding several ensemble learners, errors, sampling techniques, and feature selection techniques for refactoring prediction at the class level. Objective: This work aims to develop an ensemble-based refactoring prediction model with structural identification of source code metrics using different feature selection techniques and data sampling techniques to distribute the data uniformly. Our model finds the best classifier after achieving fewer errors during refactoring prediction at the class level. Methodology: At first, our proposed model extracts a total of 125 software metrics computed from object-oriented software systems processed for a robust multi-phased feature selection method encompassing Wilcoxon significant text, Pearson correlation test, and principal component analysis (PCA). The proposed multi-phased feature selection method retains the optimal features characterizing inheritance, size, coupling, cohesion, and complexity. After obtaining the optimal set of software metrics, a novel heterogeneous ensemble classifier is developed using techniques such as ANN-Gradient Descent, ANN-Levenberg Marquardt, ANN-GDX, ANN-Radial Basis Function; support vector machine with different kernel functions such as LSSVM-Linear, LSSVM-Polynomial, LSSVM-RBF, Decision Tree algorithm, Logistic Regression algorithm and extreme learning machine (ELM) model are used as the base classifier. In our paper, we have calculated four different errors i.e., Mean Absolute Error (MAE), Mean magnitude of Relative Error (MORE), Root Mean Square Error (RMSE), and Standard Error of Mean (SEM). Result: In our proposed model, the maximum voting ensemble (MVE) achieves better accuracy, recall, precision, and F-measure values (99.76, 99.93, 98.96, 98.44) as compared to the base trained ensemble (BTE) and it experiences less errors (MAE = 0.0057, MORE = 0.0701, RMSE = 0.0068, and SEM = 0.0107) during its implementation to develop the refactoring model. Conclusions: Our experimental result recommends that MVE with upsampling can be implemented to improve the performance of the refactoring prediction model at the class level. Furthermore, the performance of our model with different data sampling techniques and feature selection techniques has been shown in the form boxplot diagram of accuracy, F-measure, precision, recall, and area under the curve (AUC) parameters.publishedVersio

    A Multiple Criteria Decision Analysis based Approach to Remove Uncertainty in SMP Models

    Full text link
    Advanced AI technologies are serving humankind in a number of ways, from healthcare to manufacturing. Advanced automated machines are quite expensive, but the end output is supposed to be of the highest possible quality. Depending on the agility of requirements, these automation technologies can change dramatically. The likelihood of making changes to automation software is extremely high, so it must be updated regularly. If maintainability is not taken into account, it will have an impact on the entire system and increase maintenance costs. Many companies use different programming paradigms in developing advanced automated machines based on client requirements. Therefore, it is essential to estimate the maintainability of heterogeneous software. As a result of the lack of widespread consensus on software maintainability prediction (SPM) methodologies, individuals and businesses are left perplexed when it comes to determining the appropriate model for estimating the maintainability of software, which serves as the inspiration for this research. A structured methodology was designed, and the datasets were preprocessed and maintainability index (MI) range was also found for all the datasets expect for UIMS and QUES, the metric CHANGE is used for UIMS and QUES. To remove the uncertainty among the aforementioned techniques, a popular multiple criteria decision-making model, namely the technique for order preference by similarity to ideal solution (TOPSIS), is used in this work. TOPSIS revealed that GARF outperforms the other considered techniques in predicting the maintainability of heterogeneous automated software.Comment: Submitted for peer revie

    The impact of ensemble techniques on software maintenance change prediction : an empirical study

    Get PDF
    Various prediction models have been proposed by researchers to predict the change-proneness of classes based on source code metrics. However, some of these models suffer from low prediction accuracy because datasets exhibit high dimensionality or imbalanced classes. Recent studies suggest that using ensembles to integrate several models, select features, or perform sampling has the potential to resolve issues in the datasets and improve the prediction accuracy. This study aims to empirically evaluate the effectiveness of the ensemble models, feature selection, and sampling techniques on predicting change-proneness using different metrics. We conduct an empirical study to compare the performance of four machine learning models (naive Bayes, support vector machines, k-nearest neighbors, and random forests) on seven datasets for predicting change-proneness. We use two types of feature selection (relief and Pearson’s correlation coefficient) and three types of ensemble sampling techniques, which integrate different types of sampling techniques (SMOTE, spread sub-sample, and randomize). The results of this study reveal that the ensemble feature selection and sampling techniques yield improved prediction accuracy over most of the investigated models, and using sampling techniques increased the prediction accuracy of all models. Random forests provide a significant improvement over other prediction models and obtained the highest value of the average of the area under curve in all scenarios. The proposed ensemble feature selection and sampling techniques, along with the ensemble model (random forests), were found beneficial in improving the prediction accuracy of change-proneness

    11th SC@RUG 2014 proceedings:Student Colloquium 2013-2014

    Get PDF
    • …
    corecore