27,702 research outputs found

    Software Defect Prediction Based on Optimized Machine Learning Models: A Comparative Study

    Get PDF
    Software defect prediction is crucial used for detecting possible defects in software before they manifest. While machine learning models have become more prevalent in software defect prediction, their effectiveness may vary based on the dataset and hyperparameters of the model. Difficulties arise in determining the most suitable hyperparameters for the model, as well as identifying the prominent features that serve as input to the classifier. This research aims to evaluate various traditional machine learning models that are optimized for software defect prediction on NASA MDP (Metrics Data Program) datasets. The datasets were classified using k-nearest neighbors (k-NN), decision trees, logistic regression, linear discriminant analysis (LDA), single hidden layer multilayer perceptron (SHL-MLP), and Support Vector Machine (SVM). The hyperparameters of the models were fine-tuned using random search, and the feature dimensionality was decreased by utilizing principal component analysis (PCA). The synthetic minority oversampling technique (SMOTE) was implemented to oversample the minority class in order to correct the class imbalance. k-NN was found to be the most suitable for software defect prediction on several datasets, while SHL-MLP and SVM were also effective on certain datasets. It is noteworthy that logistic regression and LDA did not perform as well as the other models. Moreover, the optimized models outperform the baseline models in terms of classification accuracy. The choice of model for software defect prediction should be based on the specific characteristics of the dataset. Furthermore, hyperparameter tuning can improve the accuracy of machine learning models in predicting software defects

    Software defect prediction using maximal information coefficient and fast correlation-based filter feature selection

    Get PDF
    Software quality ensures that applications that are developed are failure free. Some modern systems are intricate, due to the complexity of their information processes. Software fault prediction is an important quality assurance activity, since it is a mechanism that correctly predicts the defect proneness of modules and classifies modules that saves resources, time and developers’ efforts. In this study, a model that selects relevant features that can be used in defect prediction was proposed. The literature was reviewed and it revealed that process metrics are better predictors of defects in version systems and are based on historic source code over time. These metrics are extracted from the source-code module and include, for example, the number of additions and deletions from the source code, the number of distinct committers and the number of modified lines. In this research, defect prediction was conducted using open source software (OSS) of software product line(s) (SPL), hence process metrics were chosen. Data sets that are used in defect prediction may contain non-significant and redundant attributes that may affect the accuracy of machine-learning algorithms. In order to improve the prediction accuracy of classification models, features that are significant in the defect prediction process are utilised. In machine learning, feature selection techniques are applied in the identification of the relevant data. Feature selection is a pre-processing step that helps to reduce the dimensionality of data in machine learning. Feature selection techniques include information theoretic methods that are based on the entropy concept. This study experimented the efficiency of the feature selection techniques. It was realised that software defect prediction using significant attributes improves the prediction accuracy. A novel MICFastCR model, which is based on the Maximal Information Coefficient (MIC) was developed to select significant attributes and Fast Correlation Based Filter (FCBF) to eliminate redundant attributes. Machine learning algorithms were then run to predict software defects. The MICFastCR achieved the highest prediction accuracy as reported by various performance measures.School of ComputingPh. D. (Computer Science

    An implementation research on software defect prediction using machine learning techniques

    Get PDF
    Software defect prediction is the process of improving software testing process by identifying defects in the software. It is accomplished by using supervised machine learning with software metrics and defect data as variables. While the theory behind software defect prediction has been validated in previous studies, it has not widely been implemented into practice. In this thesis, a software defect prediction framework is implemented for improving testing process resource allocation and software release time optimization at RELEX Solutions. For this purpose, code and change metrics are collected from RELEX software. The used metrics are selected with the criteria of their frequency of usage in other software defect prediction studies, and availability of the metric in metric collection tools. In addition to metric data, defect data is collected from issue tracker. Then, a framework for classifying the collected data is implemented and experimented on. The framework leverages existing machine learning algorithm libraries to provide classification functionality, using classifiers which are found to perform well in similar software defect prediction experiments. The results from classification are validated utilizing commonly used classifier performance metrics, in addition to which the suitability of the predictions is verified from a use case point of view. It is found that software defect prediction does work in practice, with the implementation achieving comparable results to other similar studies when measuring by classifier performance metrics. When validating against the defined use cases, the performance is found acceptable, however the performance varies between different data sets. It is thus concluded that while results are tentatively positive, further monitoring with future software versions is needed to verify performance and reliability of the framework

    Software defect prediction framework based on hybrid metaheuristic optimization methods

    Get PDF
    A software defect is an error, failure, or fault in a software that produces an incorrect or unexpected result. Software defects are expensive in quality and cost. The accurate prediction of defect‐prone software modules certainly assist testing effort, reduce costs and improve the quality of software. The classification algorithm is a popular machine learning approach for software defect prediction. Unfortunately, software defect prediction remains a largely unsolved problem. As the first problem, the comparison and benchmarking results of the defect prediction using machine learning classifiers indicate that, the poor accuracy level is dominant and no particular classifiers perform best for all the datasets. There are two main problems that affect classification performance in software defect prediction: noisy attributes and imbalanced class distribution of datasets, and difficulty of selecting optimal parameters of the classifiers. In this study, a software defect prediction framework that combines metaheuristic optimization methods for feature selection and parameter optimization, with meta learning methods for solving imbalanced class problem on datasets, which aims to improve the accuracy of classification models has been proposed. The proposed framework and models that are are considered to be the specific research contributions of this thesis are: 1) a comparison framework of classification models for software defect prediction known as CF-SDP, 2) a hybrid genetic algorithm based feature selection and bagging technique for software defect prediction known as GAFS+B, 3) a hybrid particle swarm optimization based feature selection and bagging technique for software defect prediction known as PSOFS+B, and 4) a hybrid genetic algorithm based neural network parameter optimization and bagging technique for software defect prediction, known as NN-GAPO+B. For the purpose of this study, ten classification algorithms have been selected. The selection aims at achieving a balance between established classification algorithms used in software defect prediction. The proposed framework and methods are evaluated using the state-of-the-art datasets from the NASA metric data repository. The results indicated that the proposed methods (GAFS+B, PSOFS+B and NN-GAPO+B) makes an impressive improvement in the performance of software defect prediction. GAFS+B and PSOFS+B significantly affected on the performance of the class imbalance suffered classifiers, such as C4.5 and CART. GAFS+B and PSOFS+B also outperformed the existing software defect prediction frameworks in most datasets. Based on the conducted experiments, logistic regression performs best in most of the NASA MDP datasets, without or with feature selection method. The proposed methods also generated the selected relevant features in software defect prediction. The top ten most relevant features in software defect prediction include branch count metrics, decision density, halstead level metric of a module, number of operands contained in a module, maintenance severity, number of blank LOC, halstead volume, number of unique operands contained in a module, total number of LOC and design density

    Dynamic Detection of Software Defects Using Supervised Learning Techniques

    Get PDF
    Software testing is the main step of detecting the faults in Software through executing it. Therefore, it is substantial to predict the faults that may happen while executing the software to maintain the existence of the software. There are different techniques of artificial intelligence that are utilized to predict future defects. The Machine learning is one of the most significant technique that used to build predicting models. In this paper, conducted a systematic review of the supervised machine learning techniques which are used for software defect prediction and evaluated the performance. Thus, using five state-of-the-art supervised machine learning (classifiers), for the evaluation, several of the data are used to predict software fault. In addition to, compared the performance of these classifiers with various parameters. After that, proceeds many experiments to improve the efficiency of the prediction of the defect through modifying the default parameters of the classifier. The results showed the ability of supervised machine learning algorithms to classify classes as bugs or not bugs. Thus, using supervised machine learning models for predicting software bugs is better than the traditional statistical models. Additionally, using PCA never noticeable impact on prediction systems performance while modifying the default parameters positively impact classifier values, especially with Artificial Neural Network (ANN).The main finding of this paper is gained through the application of Ensemble Learning methods, whereas Bagging achieves 95.1% accuracy with Mozilla dataset and Voting achieves 93.79% accuracy with kc1 dataset

    Explainable Software Defect Prediction from Cross Company Project Metrics Using Machine Learning

    Get PDF
    Predicting the number of defects in a project is critical for project test managers to allocate budget, resources, and schedule for testing, support and maintenance efforts. Software Defect Prediction models predict the number of defects in given projects after training the model with historical defect related information. The majority of defect prediction studies focused on predicting defect-prone modules from methods, and class-level static information, whereas this study predicts defects from project-level information based on a cross-company project dataset. This study utilizes software sizing metrics, effort metrics, and defect density information, and focuses on developing defect prediction models that apply various machine learning algorithms. One notable issue in existing defect prediction studies is the lack of transparency in the developed models. Consequently, the explain-ability of the developed model has been demonstrated using the state-of-the-art post-hoc model-agnostic method called Shapley Additive exPlanations (SHAP). Finally, important features for predicting defects from cross-company project information were identified

    Some Approaches for Software Defect Prediction

    Get PDF
    KĂ€esoleva töö peamiseks eesmĂ€rgiks on anda ĂŒldisem ĂŒlevaade protsessidest tarkvara vigade hindamise mudelites, mis kasutavad masinĂ”ppe klassifikaatoreid, ja analĂŒĂŒsida mĂ”ningaid hindamiseskperimentide tulemusi, mis on lĂ€bi viidud antud töös refereeritud uurimistöödes. Lisaks on antud lĂŒhike selgitus antud töös vaadeldavates tarkvara vigade hindamise mudelites kasutatud algoritmidest ja tuuakse vĂ€lja ning seletatakse lahti mĂ”ned hinnangumÔÔdikud, mida kasutatakse tarkvara vigade hindamise mudelite hindamistĂ€psuste mÔÔtmiseks. Tuuakse vĂ€lja ka ĂŒldine ĂŒlevaade vaadeldavates tarkvara vigade hindamise mudelites toimuvatest protsessidest.The main idea of this thesis is to give a general overview of the processes within the soft-ware defect prediction models using machine learning classifiers and to provide analysis to some of the results of the evaluation experiments conducted in the research papers covered in this work. Additionally, a brief explanation of the algorithms used within the software defect prediction models covered in this work is given and some of the evaluation measures used to evaluate the prediction accuracy of software defect prediction models are listed and explained. Also, a general overview of the processes within a handful of specific software defect prediction models is provided

    Software Defect Prediction Using Neural Network Based SMOTE

    Get PDF
    Software defect prediction is a practical approach to improve the quality and efficiency of time and costs for software testing by focusing on defect modules. The defect prediction software dataset naturally has a class imbalance problem with very few defective modules compared to non-defective modules. Class imbalance can reduce performance from classification. In this study, we applied the Neural Networks Based Synthetic Minority Over-sampling Technique (SMOTE) to overcome class imbalances in the six NASA datasets. Neural Network based on SMOTE is a combination of Neural Network and SMOTE with each hyperparameters that are optimized using random search. The results use a nested 5-cross validation show increases Bal by 25.48% and Recall by 45.99% compared to the original Neural Network. We also compare the performance of Neural Network based SMOTE with SMOTE + Traditional Machine Learning Algorithm. The Neural Network based SMOTE takes first place in the average rank
