Search CORE

327 research outputs found

A novel approach for code smell detection : an empirical study

Author: Dewangan Seema
Gupta Manjari
Mishra Alok
Rao Rajwant Singh
Publication venue
Publication date: 01/01/2021
Field of study

Code smells detection helps in improving understandability and maintainability of software while reducing the chances of system failure. In this study, six machine learning algorithms have been applied to predict code smells. For this purpose, four code smell datasets (God-class, Data-class, Feature-envy, and Long-method) are considered which are generated from 74 open-source systems. To evaluate the performance of machine learning algorithms on these code smell datasets, 10-fold cross validation technique is applied that predicts the model by partitioning the original dataset into a training set to train the model and test set to evaluate it. Two feature selection techniques are applied to enhance our prediction accuracy. The Chi-squared and Wrapper-based feature selection techniques are used to improve the accuracy of total six machine learning methods by choosing the top metrics in each dataset. Results obtained by applying these two feature selection techniques are compared. To improve the accuracy of these algorithms, grid search-based parameter optimization technique is applied. In this study, 100% accuracy was obtained for the Long-method dataset by using the Logistic Regression algorithm with all features while the worst performance 95.20 % was obtained by Naive Bayes algorithm for the Long-method dataset using the chi-square feature selection technique.publishedVersio

Brage HiM

Predicting software maintainability in object-oriented systems using ensemble techniques

Author: Alsolai Hadeel
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 12/11/2018
Field of study

Prediction of the maintainability of classes in object-oriented systems is a significant factor for software success, however it is a challenging task to achieve. To date, several machine learning models have been applied with variable results and no clear indication of which techniques are more appropriate. With the goal of achieving more consistent results, this paper presents the first set of results in an extensive empirical study designed to evaluate the capability of bagging models to increase accuracy prediction over individual models. The study compares two major machine learning based approaches for predicting software maintainability: individual models (regression tree, multilayer perceptron, k-nearest neighbors and m5rules), and an ensemble model (bagging) that are applied to the QUES data set. The results obtained from this study indicate that k-nearest neighbors model outperformed all other individual models. The bagging ensemble model improved accuracy prediction significantly over almost all individual models, and the bagging ensemble models with k-nearest neighbors as a base model achieved superior accurate prediction. This paper also provides a description of the planned programme of research which aims to investigate the performance over various datasets of advanced (ensemble-based) machine learning models

A Hyper-parameter Tuning based Novel Model for Prediction of Software Maintainability

Author: Singh Raghuraj
Yadav Rohit
Publication venue: Auricle Global Society of Education and Research
Publication date: 10/03/2023
Field of study

Software maintainability is regarded as one of the most important characteristics of any software system. In today's digital world, the expanding significance of software maintenance is motivating the development of efficient software maintainability prediction (SMP) models using statistical and machine learning methods. This study proposes a hyper-parameter optimizable Software Maintainability Prediction (HPOSMP) model using the hybridized approach of data balancing and hyper-parameter optimization of Machine Learning (ML) approach using software maintainability datasets. The training dataset has been created with object-oriented software namely UIMS and QUES. To balance the dataset, Synthetic Minority Oversampling Technique (SMOTE) technology has been adopted. Further, Decision Tree, Gaussian Naïve Bayes, K-Nearest neighbour, Logistic Regression, and Support Vector Machine are adopted as Machine Learning and Statistical Regression Techniques for training of software maintainability dataset. Results demonstrate that the proposed HPOSMP model gives better performance as compared to the base SMP models

International Journal on Recent and Innovation Trends in Computing and Communication

IMPLEMENTASI MODEL PEMBELAJARAN MESIN DENGAN METODE ENSAMBEL DAN TEKNIK SELEKSI FITUR PADA PREDIKSI TINGKAT KEMAMPUAN PEMELIHARAAN PERANGKAT LUNAK

Author: Mochamad Nurul Huda -
Publication venue
Publication date: 20/01/2023
Field of study

Tingkat kemampuan pemeliharaan perangkat lunak merupakan salah satu atribut eksternal dasar dari kualitas perangkat lunak yang mengukur tingkat efektivitas dan efisiensi di mana suatu perangkat lunak dapat dimodifikasi oleh pemelihara perangkat lunak tersebut. Tingkat kemampuan pemeliharaan perangkat lunak diukur menggunakan prediksi sebuah model pembelajaran mesin berdasarkan sejumlah atribut kualitas perangkat lunak untuk mendukung dan membantu dalam pengambilan keputusan pada saat proses pemeliharaan perangkat lunak dilakukan. Sumber himpunan data baru yang terdiri dari lima dataset perangkat lunak berorientasi objek Java dengan tujuh belas jenis metrik tingkat kelas digunakan dalam penelitian ini. Model pembelajaran mesin dibangun dengan menggunakan beberapa model individu seperti Lasso Regression, K-Nearest Neighbors, Regression Tree, Multilayer Perceptron, M5Rules, Support Vector Machine, Artificial Neural Network, dan dengan menggunakan metode ensambel seperti Bagging dan AdaBoost. Selain itu, teknik seleksi fitur dipertimbangkan untuk mengidentifikasi fitur terbaik sehingga meningkatkan performa dari model prediksi. Penelitian ini bertujuan untuk menyelidiki performa berbagai sumber himpunan data dalam model pembelajaran mesin. Performa model ini di evaluasi dengan menggunakan tiga metrik evaluasi, yaitu MMRE, MAE, dan Pred. Hasil menunjukkan bahwa ANN menjadi algoritma terbaik pada model individu dengan MMRE 0.88 pada dataset Equinox Framework. Metode ensambel terbukti meningkatkan performa dari model dengan ketentuan metode ensambel cocok dengan algoritma individu yang digunakan. Performa terbaik didapatkan metode AdaBoost dengan ANN pada dataset Lucene dengan MMRE 0.78. Teknik seleksi fitur juga terbukti meningkatkan beberapa model prediksi dengan penghapusan fitur yang tepat dan algoritma yang digunakan cocok dengan distribusi datanya. ----- Software maintainability is one of the primary external attributes of software quality that measures the effectiveness and efficiency with which the software maintainer can modify the software. Software maintainability is measured using the prediction of machine learning models based on several software quality attributes to support and assist in decision-making during the software maintenance process. This study used new datasets consisting of five Java object-oriented software systems with seventeen class-level metrics. Machine learning models are built using several individual models such as Lasso Regression, K-Nearest Neighbors, Regression Tree, Multilayer Perceptron, M5Rules, Support Vector Machine, Artificial Neural Network, and by using ensemble methods such as Bagging and AdaBoost. In addition, feature selection techniques are considered to identify the best features, thereby increasing the prediction model's performance. This research aims to investigate the performance of various dataset sources in machine learning models. The performance of these models is evaluated using three evaluation metrics, namely MMRE, MAE, and Pred. The results show that ANN is the best algorithm for individual models with MMRE 0.88 on the Equinox Framework dataset. The ensemble method is proven to improve the performance of the model, provided that the ensemble method matches the individual algorithms used. The AdaBoost method obtained the best performance with ANN on the Lucene dataset with MMRE 0.78. The feature selection technique is also proven to improve several prediction models with the proper feature removal, and the algorithm matches the data distribution

Application of ensemble techniques in predicting object-oriented software maintainability

Author: Conte S. D.
Dagpinar M.
DeMarco T.
Genero M.
Kohavi R.
Mendes E.
Oman P.
Sammut C.
Software Engineering O. i. d. normalisation Systems
Welker K. D.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 15/04/2019
Field of study

While prior object-oriented software maintainability literature acknowledges the role of machine learning techniques as valuable predictors of potential change, the most suitable technique that achieves consistently high accuracy remains undetermined. With the objective of obtaining more consistent results, an ensemble technique is investigated to advance the performance of the individual models and increase their accuracy in predicting software maintainability of the object-oriented system. This paper describes the research plan for predicting object-oriented software maintainability using ensemble techniques. First, we present a brief overview of the main research background and its different components. Second, we explain the research methodology. Third, we provide expected results. Finally, we conclude summary of the current status

An Extensive Analysis of Machine Learning Based Boosting Algorithms for Software Maintainability Prediction

Author: Chug Anuradha
Gupta Shikha
Publication venue: 'Universidad Internacional de La Rioja'
Publication date: 10/05/2022
Field of study

Software Maintainability is an indispensable factor to acclaim for the quality of particular software. It describes the ease to perform several maintenance activities to make a software adaptable to the modified environment. The availability & growing popularity of a wide range of Machine Learning (ML) algorithms for data analysis further provides the motivation for predicting this maintainability. However, an extensive analysis & comparison of various ML based Boosting Algorithms (BAs) for Software Maintainability Prediction (SMP) has not been made yet. Therefore, the current study analyzes and compares five different BAs, i.e., AdaBoost, GBM, XGB, LightGBM, and CatBoost, for SMP using open-source datasets. Performance of the propounded prediction models has been evaluated using Root Mean Square Error (RMSE), Mean Magnitude of Relative Error (MMRE), Pred(0.25), Pred(0.30), & Pred(0.75) as prediction accuracy measures followed by a non-parametric statistical test and a post hoc analysis to account for the differences in the performances of various BAs. Based on the residual errors obtained, it was observed that GBM is the best performer, followed by LightGBM for RMSE, whereas, in the case of MMRE, XGB performed the best for six out of the seven datasets, i.e., for 85.71% of the total datasets by providing minimum values for MMRE, ranging from 0.90 to 3.82. Further, on applying the statistical test and on performing the post hoc analysis, it was found that significant differences exist in the performance of different BAs and, XGB and CatBoost outperformed all other BAs for MMRE. Lastly, a comparison of BAs with four other ML algorithms has also been made to bring out BAs superiority over other algorithms. This study would open new doors for the software developers for carrying out comparatively more precise predictions well in time and hence reduce the overall maintenance costs

Hybrid intelligent model for software maintenance prediction

Author: Alshayeb Mohammad
Baig Zubair A
Baqais Abdulrahman Ahmed Bobakr
Publication venue: Edith Cowan University, Research Online, Perth, Western Australia
Publication date: 01/01/2013
Field of study

Maintenance is an important activity in the software life cycle. No software product can do without undergoing the process of maintenance. Estimating a software’s maintainability effort and cost is not an easy task considering the various factors that influence the proposed measurement. Hence, Artificial Intelligence (AI) techniques have been used extensively to find optimized and more accurate maintenance estimations. In this paper, we propose an Evolutionary Neural Network (NN) model to predict software maintainability. The proposed model is based on a hybrid intelligent technique wherein a neural network is trained for prediction and a genetic algorithm (GA) implementation is used for evolving the neural network topology until an optimal topology is reached. The model was applied on a popular open source program, namely, Android. The results are very promising, where the correlation between actual and predicted points reaches 0.9

Research Online @ ECU

A systematic literature review of machine learning techniques for software maintainability prediction

Author: Aggarwal
Ahmed
Al Dallal
Aljamaan
Almugrin
Bandi
Basgalupp
Basri
Bhattacharya
Brereton
Briand
Burrows
Chen
Chidamber
Chidamber
Coleman
Conte
Cukic
Dagpinar
Daly
De Lucia
Dubey
Elish
Elish
Elish
Fioravanti
Genero
Gill
Granja-Alvarez
Hadeel Alsolai
Hayes
Hayes
Hegedűs
Jain
Jin
Jorgensen
Kaur
Kaur
Kaur
Kitchenham
Kitchenham
Kumar
Kumar
Kumar
Kumar
Kádár
Land
Lee
Levin
Li
Liberati
Lim
Lincke
MacDonell
Malhotra
Malhotra
Malhotra
Malhotra
Marc Roper
Menzies
Menzies
Misra
Nguyen
Oman
Oman
Opitz
Polo
Raj Kiran
Reddy
Riaz
Riaz
Sammut
Sarwar
Sehra
Shafiabady
Sheldon
Shepperd
Singh
Srivastava
Thwin
Tiwari
van Koten
Welker
Ye
Zhang
Zhang
Zhou
Zhou
Publication venue: 'Elsevier BV'
Publication date: 31/03/2020
Field of study

Context: Software maintainability is one of the fundamental quality attributes of software engineering. The accurate prediction of software maintainability is a significant challenge for the effective management of the software maintenance process. Objective: The major aim of this paper is to present a systematic review of studies related to the prediction of maintainability of object-oriented software systems using machine learning techniques. This review identifies and investigates a number of research questions to comprehensively summarize, analyse and discuss various viewpoints concerning software maintainability measurements, metrics, datasets, evaluation measures, individual models and ensemble models. Method: The review uses the standard systematic literature review method applied to the most common computer science digital database libraries from January 1991 to July 2018. Results: We survey 56 relevant studies in 35 journals and 21 conference proceedings. The results indicate that there is relatively little activity in the area of software maintainability prediction compared with other software quality attributes. CHANGE maintenance effort and the maintainability index were the most commonly used software measurements (dependent variables) employed in the selected primary studies, and most made use of class-level product metrics as the independent variables. Several private datasets were used in the selected studies, and there is a growing demand to publish datasets publicly. Most studies focused on regression problems and performed k-fold cross-validation. Individual prediction models were employed in the majority of studies, while ensemble models relatively rarely. Conclusion: Based on the findings obtained in this systematic literature review, ensemble models demonstrated increased accuracy prediction over individual models, and have been shown to be useful models in predicting software maintainability. However, their application is relatively rare and there is a need to apply these, and other models to an extensive variety of datasets with the aim of improving the accuracy and consistency of results

Software Quality Assessment using Ensemble Models

Author: Aljamaan Hamoud
Publication venue
Publication date: 30/06/2009
Field of study

KFUPM ePrints

Effort Estimation For Object-oriented System Using Stochastic Gradient Boosting Technique

Author: Acharya Barada Prasanna
Publication venue
Publication date: 12/05/2014
Field of study

The success of software development depends on the proper prediction of the effort required to develop the software. Project managers oblige a solid methodology for software effort prediction. It is particularly paramount throughout the early stages of the software development life cycle. Faultless software effort estimation is a major concern in software commercial enterprises. Stochastic Gradient Boosting (SGB) is a machine learning techniques that helps in getting improved estimated values. SGB is used for improving the accuracy of estimation models using decision trees. In this paper, the basic aim is the effort prediction required to develop various software projects using both the class point and the use case point approach. Then, optimization of the effort parameters is achieved using the SGB technique to obtain better prediction accuracy. Furthermore, performance comparisons of the models obtained using the SGB technique with the other machine learning techniques are presented in order to highlight the performance achieved by each method

ethesis@nitr