1,320 research outputs found

    Predicting software maintainability in object-oriented systems using ensemble techniques

    Get PDF
    Prediction of the maintainability of classes in object-oriented systems is a significant factor for software success, however it is a challenging task to achieve. To date, several machine learning models have been applied with variable results and no clear indication of which techniques are more appropriate. With the goal of achieving more consistent results, this paper presents the first set of results in an extensive empirical study designed to evaluate the capability of bagging models to increase accuracy prediction over individual models. The study compares two major machine learning based approaches for predicting software maintainability: individual models (regression tree, multilayer perceptron, k-nearest neighbors and m5rules), and an ensemble model (bagging) that are applied to the QUES data set. The results obtained from this study indicate that k-nearest neighbors model outperformed all other individual models. The bagging ensemble model improved accuracy prediction significantly over almost all individual models, and the bagging ensemble models with k-nearest neighbors as a base model achieved superior accurate prediction. This paper also provides a description of the planned programme of research which aims to investigate the performance over various datasets of advanced (ensemble-based) machine learning models

    A systematic literature review of machine learning techniques for software maintainability prediction

    Get PDF
    Context: Software maintainability is one of the fundamental quality attributes of software engineering. The accurate prediction of software maintainability is a significant challenge for the effective management of the software maintenance process. Objective: The major aim of this paper is to present a systematic review of studies related to the prediction of maintainability of object-oriented software systems using machine learning techniques. This review identifies and investigates a number of research questions to comprehensively summarize, analyse and discuss various viewpoints concerning software maintainability measurements, metrics, datasets, evaluation measures, individual models and ensemble models. Method: The review uses the standard systematic literature review method applied to the most common computer science digital database libraries from January 1991 to July 2018. Results: We survey 56 relevant studies in 35 journals and 21 conference proceedings. The results indicate that there is relatively little activity in the area of software maintainability prediction compared with other software quality attributes. CHANGE maintenance effort and the maintainability index were the most commonly used software measurements (dependent variables) employed in the selected primary studies, and most made use of class-level product metrics as the independent variables. Several private datasets were used in the selected studies, and there is a growing demand to publish datasets publicly. Most studies focused on regression problems and performed k-fold cross-validation. Individual prediction models were employed in the majority of studies, while ensemble models relatively rarely. Conclusion: Based on the findings obtained in this systematic literature review, ensemble models demonstrated increased accuracy prediction over individual models, and have been shown to be useful models in predicting software maintainability. However, their application is relatively rare and there is a need to apply these, and other models to an extensive variety of datasets with the aim of improving the accuracy and consistency of results

    An Extensive Analysis of Machine Learning Based Boosting Algorithms for Software Maintainability Prediction

    Get PDF
    Software Maintainability is an indispensable factor to acclaim for the quality of particular software. It describes the ease to perform several maintenance activities to make a software adaptable to the modified environment. The availability & growing popularity of a wide range of Machine Learning (ML) algorithms for data analysis further provides the motivation for predicting this maintainability. However, an extensive analysis & comparison of various ML based Boosting Algorithms (BAs) for Software Maintainability Prediction (SMP) has not been made yet. Therefore, the current study analyzes and compares five different BAs, i.e., AdaBoost, GBM, XGB, LightGBM, and CatBoost, for SMP using open-source datasets. Performance of the propounded prediction models has been evaluated using Root Mean Square Error (RMSE), Mean Magnitude of Relative Error (MMRE), Pred(0.25), Pred(0.30), & Pred(0.75) as prediction accuracy measures followed by a non-parametric statistical test and a post hoc analysis to account for the differences in the performances of various BAs. Based on the residual errors obtained, it was observed that GBM is the best performer, followed by LightGBM for RMSE, whereas, in the case of MMRE, XGB performed the best for six out of the seven datasets, i.e., for 85.71% of the total datasets by providing minimum values for MMRE, ranging from 0.90 to 3.82. Further, on applying the statistical test and on performing the post hoc analysis, it was found that significant differences exist in the performance of different BAs and, XGB and CatBoost outperformed all other BAs for MMRE. Lastly, a comparison of BAs with four other ML algorithms has also been made to bring out BAs superiority over other algorithms. This study would open new doors for the software developers for carrying out comparatively more precise predictions well in time and hence reduce the overall maintenance costs

    Class-Level Refactoring Prediction by Ensemble Learning with Various Feature Selection Techniques

    Get PDF
    Background: Refactoring is changing a software system without affecting the software functionality. The current researchers aim i to identify the appropriate method(s) or class(s) that needs to be refactored in object-oriented software. Ensemble learning helps to reduce prediction errors by amalgamating different classifiers and their respective performances over the original feature data. Other motives are added in this paper regarding several ensemble learners, errors, sampling techniques, and feature selection techniques for refactoring prediction at the class level. Objective: This work aims to develop an ensemble-based refactoring prediction model with structural identification of source code metrics using different feature selection techniques and data sampling techniques to distribute the data uniformly. Our model finds the best classifier after achieving fewer errors during refactoring prediction at the class level. Methodology: At first, our proposed model extracts a total of 125 software metrics computed from object-oriented software systems processed for a robust multi-phased feature selection method encompassing Wilcoxon significant text, Pearson correlation test, and principal component analysis (PCA). The proposed multi-phased feature selection method retains the optimal features characterizing inheritance, size, coupling, cohesion, and complexity. After obtaining the optimal set of software metrics, a novel heterogeneous ensemble classifier is developed using techniques such as ANN-Gradient Descent, ANN-Levenberg Marquardt, ANN-GDX, ANN-Radial Basis Function; support vector machine with different kernel functions such as LSSVM-Linear, LSSVM-Polynomial, LSSVM-RBF, Decision Tree algorithm, Logistic Regression algorithm and extreme learning machine (ELM) model are used as the base classifier. In our paper, we have calculated four different errors i.e., Mean Absolute Error (MAE), Mean magnitude of Relative Error (MORE), Root Mean Square Error (RMSE), and Standard Error of Mean (SEM). Result: In our proposed model, the maximum voting ensemble (MVE) achieves better accuracy, recall, precision, and F-measure values (99.76, 99.93, 98.96, 98.44) as compared to the base trained ensemble (BTE) and it experiences less errors (MAE = 0.0057, MORE = 0.0701, RMSE = 0.0068, and SEM = 0.0107) during its implementation to develop the refactoring model. Conclusions: Our experimental result recommends that MVE with upsampling can be implemented to improve the performance of the refactoring prediction model at the class level. Furthermore, the performance of our model with different data sampling techniques and feature selection techniques has been shown in the form boxplot diagram of accuracy, F-measure, precision, recall, and area under the curve (AUC) parameters.publishedVersio

    A Multiple Criteria Decision Analysis based Approach to Remove Uncertainty in SMP Models

    Full text link
    Advanced AI technologies are serving humankind in a number of ways, from healthcare to manufacturing. Advanced automated machines are quite expensive, but the end output is supposed to be of the highest possible quality. Depending on the agility of requirements, these automation technologies can change dramatically. The likelihood of making changes to automation software is extremely high, so it must be updated regularly. If maintainability is not taken into account, it will have an impact on the entire system and increase maintenance costs. Many companies use different programming paradigms in developing advanced automated machines based on client requirements. Therefore, it is essential to estimate the maintainability of heterogeneous software. As a result of the lack of widespread consensus on software maintainability prediction (SPM) methodologies, individuals and businesses are left perplexed when it comes to determining the appropriate model for estimating the maintainability of software, which serves as the inspiration for this research. A structured methodology was designed, and the datasets were preprocessed and maintainability index (MI) range was also found for all the datasets expect for UIMS and QUES, the metric CHANGE is used for UIMS and QUES. To remove the uncertainty among the aforementioned techniques, a popular multiple criteria decision-making model, namely the technique for order preference by similarity to ideal solution (TOPSIS), is used in this work. TOPSIS revealed that GARF outperforms the other considered techniques in predicting the maintainability of heterogeneous automated software.Comment: Submitted for peer revie

    Application of ensemble techniques in predicting object-oriented software maintainability

    Get PDF
    While prior object-oriented software maintainability literature acknowledges the role of machine learning techniques as valuable predictors of potential change, the most suitable technique that achieves consistently high accuracy remains undetermined. With the objective of obtaining more consistent results, an ensemble technique is investigated to advance the performance of the individual models and increase their accuracy in predicting software maintainability of the object-oriented system. This paper describes the research plan for predicting object-oriented software maintainability using ensemble techniques. First, we present a brief overview of the main research background and its different components. Second, we explain the research methodology. Third, we provide expected results. Finally, we conclude summary of the current status

    A novel approach for code smell detection : an empirical study

    Get PDF
    Code smells detection helps in improving understandability and maintainability of software while reducing the chances of system failure. In this study, six machine learning algorithms have been applied to predict code smells. For this purpose, four code smell datasets (God-class, Data-class, Feature-envy, and Long-method) are considered which are generated from 74 open-source systems. To evaluate the performance of machine learning algorithms on these code smell datasets, 10-fold cross validation technique is applied that predicts the model by partitioning the original dataset into a training set to train the model and test set to evaluate it. Two feature selection techniques are applied to enhance our prediction accuracy. The Chi-squared and Wrapper-based feature selection techniques are used to improve the accuracy of total six machine learning methods by choosing the top metrics in each dataset. Results obtained by applying these two feature selection techniques are compared. To improve the accuracy of these algorithms, grid search-based parameter optimization technique is applied. In this study, 100% accuracy was obtained for the Long-method dataset by using the Logistic Regression algorithm with all features while the worst performance 95.20 % was obtained by Naive Bayes algorithm for the Long-method dataset using the chi-square feature selection technique.publishedVersio

    IMPLEMENTASI MODEL PEMBELAJARAN MESIN DENGAN METODE ENSAMBEL DAN TEKNIK SELEKSI FITUR PADA PREDIKSI TINGKAT KEMAMPUAN PEMELIHARAAN PERANGKAT LUNAK

    Get PDF
    Tingkat kemampuan pemeliharaan perangkat lunak merupakan salah satu atribut eksternal dasar dari kualitas perangkat lunak yang mengukur tingkat efektivitas dan efisiensi di mana suatu perangkat lunak dapat dimodifikasi oleh pemelihara perangkat lunak tersebut. Tingkat kemampuan pemeliharaan perangkat lunak diukur menggunakan prediksi sebuah model pembelajaran mesin berdasarkan sejumlah atribut kualitas perangkat lunak untuk mendukung dan membantu dalam pengambilan keputusan pada saat proses pemeliharaan perangkat lunak dilakukan. Sumber himpunan data baru yang terdiri dari lima dataset perangkat lunak berorientasi objek Java dengan tujuh belas jenis metrik tingkat kelas digunakan dalam penelitian ini. Model pembelajaran mesin dibangun dengan menggunakan beberapa model individu seperti Lasso Regression, K-Nearest Neighbors, Regression Tree, Multilayer Perceptron, M5Rules, Support Vector Machine, Artificial Neural Network, dan dengan menggunakan metode ensambel seperti Bagging dan AdaBoost. Selain itu, teknik seleksi fitur dipertimbangkan untuk mengidentifikasi fitur terbaik sehingga meningkatkan performa dari model prediksi. Penelitian ini bertujuan untuk menyelidiki performa berbagai sumber himpunan data dalam model pembelajaran mesin. Performa model ini di evaluasi dengan menggunakan tiga metrik evaluasi, yaitu MMRE, MAE, dan Pred. Hasil menunjukkan bahwa ANN menjadi algoritma terbaik pada model individu dengan MMRE 0.88 pada dataset Equinox Framework. Metode ensambel terbukti meningkatkan performa dari model dengan ketentuan metode ensambel cocok dengan algoritma individu yang digunakan. Performa terbaik didapatkan metode AdaBoost dengan ANN pada dataset Lucene dengan MMRE 0.78. Teknik seleksi fitur juga terbukti meningkatkan beberapa model prediksi dengan penghapusan fitur yang tepat dan algoritma yang digunakan cocok dengan distribusi datanya. ----- Software maintainability is one of the primary external attributes of software quality that measures the effectiveness and efficiency with which the software maintainer can modify the software. Software maintainability is measured using the prediction of machine learning models based on several software quality attributes to support and assist in decision-making during the software maintenance process. This study used new datasets consisting of five Java object-oriented software systems with seventeen class-level metrics. Machine learning models are built using several individual models such as Lasso Regression, K-Nearest Neighbors, Regression Tree, Multilayer Perceptron, M5Rules, Support Vector Machine, Artificial Neural Network, and by using ensemble methods such as Bagging and AdaBoost. In addition, feature selection techniques are considered to identify the best features, thereby increasing the prediction model's performance. This research aims to investigate the performance of various dataset sources in machine learning models. The performance of these models is evaluated using three evaluation metrics, namely MMRE, MAE, and Pred. The results show that ANN is the best algorithm for individual models with MMRE 0.88 on the Equinox Framework dataset. The ensemble method is proven to improve the performance of the model, provided that the ensemble method matches the individual algorithms used. The AdaBoost method obtained the best performance with ANN on the Lucene dataset with MMRE 0.78. The feature selection technique is also proven to improve several prediction models with the proper feature removal, and the algorithm matches the data distribution
    • …
    corecore