278 research outputs found

    Software Defect Prediction Using Neural Network Based SMOTE

    Get PDF
    Software defect prediction is a practical approach to improve the quality and efficiency of time and costs for software testing by focusing on defect modules. The defect prediction software dataset naturally has a class imbalance problem with very few defective modules compared to non-defective modules. Class imbalance can reduce performance from classification. In this study, we applied the Neural Networks Based Synthetic Minority Over-sampling Technique (SMOTE) to overcome class imbalances in the six NASA datasets. Neural Network based on SMOTE is a combination of Neural Network and SMOTE with each hyperparameters that are optimized using random search. The results use a nested 5-cross validation show increases Bal by 25.48% and Recall by 45.99% compared to the original Neural Network. We also compare the performance of Neural Network based SMOTE with SMOTE + Traditional Machine Learning Algorithm. The Neural Network based SMOTE takes first place in the average rank

    Preliminary Comparison of Techniques for Dealing with Imbalance in Software Defect Prediction

    Get PDF
    Imbalanced data is a common problem in data mining when dealing with classi cation problems, where samples of a class vastly outnumber other classes. In this situation, many data mining algorithms generate poor models as they try to opti- mize the overall accuracy and perform badly in classes with very few samples. Software Engineering data in general and defect prediction datasets are not an exception and in this paper, we compare different approaches, namely sampling, cost-sensitive, ensemble and hybrid approaches to the prob- lem of defect prediction with different datasets preprocessed differently. We have used the well-known NASA datasets curated by Shepperd et al. There are differences in the re- sults depending on the characteristics of the dataset and the evaluation metrics, especially if duplicates and inconsisten- cies are removed as a preprocessing step.Unión Europea ICEBERG 324356MICYT TIN2007- 68084-C02-02MICYT TIN2013-46928-C3-2-

    Predictive Analytics and Software Defect Severity: A Systematic Review and Future Directions

    Get PDF
    Software testing identifies defects in software products with varying multiplying effects based on their severity levels and sequel to instant rectifications, hence the rate of a research study in the software engineering domain. In this paper, a systematic literature review (SLR) on machine learning-based software defect severity prediction was conducted in the last decade. The SLR was aimed at detecting germane areas central to efficient predictive analytics, which are seldom captured in existing software defect severity prediction reviews. The germane areas include the analysis of techniques or approaches which have a significant influence on the threats to the validity of proposed models, and the bias-variance tradeoff considerations techniques in data science-based approaches. A population, intervention, and outcome model is adopted for better search terms during the literature selection process, and subsequent quality assurance scrutiny yielded fifty-two primary studies. A subsequent thoroughbred systematic review was conducted on the final selected studies to answer eleven main research questions, which uncovers approaches that speak to the aforementioned germane areas of interest. The results indicate that while the machine learning approach is ubiquitous for predicting software defect severity, germane techniques central to better predictive analytics are infrequent in literature. This study is concluded by summarizing prominent study trends in a mind map to stimulate future research in the software engineering industry.publishedVersio

    Integrasi Teknik Smote Bagging Dengan Information Gain Pada Naive Bayes Untuk Prediksi Cacat Software

    Get PDF
    The prediction accuracy of defects in code, can help direct the test effort, reduce costs and improve software quality. Until now, many researchers have applied various types of algorithm based on machine learning and statistical methods to build predictive performance software defects. One of them uses machine learning approach to the classification, which is a popular approach to predict software defects. While Naive Bayes one simple classification to have good performance that produces an average probability of 71 percent. As well as the time required in the process of learning faster than on any other machine learning. Additionally it has a good reputation on the accuracy of the prediction. While NASA MDP is a very popular data used by previous researchers in the development of predictive models of software defects. Because it is common and freely used by researchers. However, these data have deficiencies, including the occurrence of imbalance class and attribute noise. Therefore by using SMOTE (Synthetic Minority Over-Sampling Technique) for sampling techniques and Bagging on the ensemble method, is used to deal with the class imbalance. As for dealing with noise attribute, in this research using information gain in the process of selecting the relevant attributes. So after the trial that the application of the model SMOTE Bagging and Information Gain proven to obtain good results to handled imbalance class and attribute noise at prediction software defects, and can increase the accuracy of the prediction results software defects

    A Cross-project Defect Prediction Model Using Feature Transfer and Ensemble Learning

    Get PDF
    Cross-project defect prediction (CPDP) trains the prediction models with existing data from other projects (the source projects) and uses the trained model to predict the target projects. To solve two major problems in CPDP, namely, variability in data distribution and class imbalance, in this paper we raise a CPDP model combining feature transfer and ensemble learning, with two stages of feature transfer and the classification. The feature transfer method is based on Pearson correlation coefficient, which reduces the dimension of feature space and the difference of feature distribution between items. The class imbalance is solved by SMOTE and Voting on both algorithm and data levels. The experimental results on 20 source-target projects show that our method can yield significant improvement on CPDP

    Predictive Framework for Imbalance Dataset

    Get PDF
    The purpose of this research is to seek and propose a new predictive maintenance framework which can be used to generate a prediction model for deterioration of process materials. Real yield data which was obtained from Fuji Electric Malaysia has been used in this research. The existing data pre-processing and classification methodologies have been adapted in this research. Properties of the proposed framework include; developing an approach to correlate materials defects, developing an approach to represent data attributes features, analyzing various ratio and types of data re-sampling, analyzing the impact of data dimension reduction for various data size, and partitioning data size and algorithmic schemes against the prediction performance. Experimental results suggested that the class probability distribution function of a prediction model has to be closer to a training dataset; less skewed environment enable learning schemes to discover better function F in a bigger Fall space within a higher dimensional feature space, data sampling and partition size is appear to proportionally improve the precision and recall if class distribution ratios are balanced. A comparative study was also conducted and showed that the proposed approaches have performed better. This research was conducted based on limited number of datasets, test sets and variables. Thus, the obtained results are applicable only to the study domain with selected datasets. This research has introduced a new predictive maintenance framework which can be used in manufacturing industries to generate a prediction model based on the deterioration of process materials. Consequently, this may allow manufactures to conduct predictive maintenance not only for equipments but also process materials. The major contribution of this research is a step by step guideline which consists of methods/approaches in generating a prediction for process materials
    corecore