4,505 research outputs found

    Is "Better Data" Better than "Better Data Miners"? (On the Benefits of Tuning SMOTE for Defect Prediction)

    Full text link
    We report and fix an important systematic error in prior studies that ranked classifiers for software analytics. Those studies did not (a) assess classifiers on multiple criteria and they did not (b) study how variations in the data affect the results. Hence, this paper applies (a) multi-criteria tests while (b) fixing the weaker regions of the training data (using SMOTUNED, which is a self-tuning version of SMOTE). This approach leads to dramatically large increases in software defect predictions. When applied in a 5*5 cross-validation study for 3,681 JAVA classes (containing over a million lines of code) from open source systems, SMOTUNED increased AUC and recall by 60% and 20% respectively. These improvements are independent of the classifier used to predict for quality. Same kind of pattern (improvement) was observed when a comparative analysis of SMOTE and SMOTUNED was done against the most recent class imbalance technique. In conclusion, for software analytic tasks like defect prediction, (1) data pre-processing can be more important than classifier choice, (2) ranking studies are incomplete without such pre-processing, and (3) SMOTUNED is a promising candidate for pre-processing.Comment: 10 pages + 2 references. Accepted to International Conference of Software Engineering (ICSE), 201

    Is "Better Data" Better than "Better Data Miners"? (On the Benefits of Tuning SMOTE for Defect Prediction)

    Full text link
    We report and fix an important systematic error in prior studies that ranked classifiers for software analytics. Those studies did not (a) assess classifiers on multiple criteria and they did not (b) study how variations in the data affect the results. Hence, this paper applies (a) multi-criteria tests while (b) fixing the weaker regions of the training data (using SMOTUNED, which is a self-tuning version of SMOTE). This approach leads to dramatically large increases in software defect predictions. When applied in a 5*5 cross-validation study for 3,681 JAVA classes (containing over a million lines of code) from open source systems, SMOTUNED increased AUC and recall by 60% and 20% respectively. These improvements are independent of the classifier used to predict for quality. Same kind of pattern (improvement) was observed when a comparative analysis of SMOTE and SMOTUNED was done against the most recent class imbalance technique. In conclusion, for software analytic tasks like defect prediction, (1) data pre-processing can be more important than classifier choice, (2) ranking studies are incomplete without such pre-processing, and (3) SMOTUNED is a promising candidate for pre-processing.Comment: 10 pages + 2 references. Accepted to International Conference of Software Engineering (ICSE), 201

    A Review of Metrics and Modeling Techniques in Software Fault Prediction Model Development

    Get PDF
    This paper surveys different software fault predictions progressed through different data analytic techniques reported in the software engineering literature. This study split in three broad areas; (a) The description of software metrics suites reported and validated in the literature. (b) A brief outline of previous research published in the development of software fault prediction model based on various analytic techniques. This utilizes the taxonomy of analytic techniques while summarizing published research. (c) A review of the advantages of using the combination of metrics. Though, this area is comparatively new and needs more research efforts

    Applying FAHP to Improve the Performance Evaluation Reliability and Validity of Software Defect Classifiers

    Get PDF
    Today’s Software complexity makes developing defect-free software almost impossible. On an average, billions of dollars are lost every year because of software defects in the United States alone, while the global loss is much higher. Consequently, developing classifiers to classify software modules into defective and non-defective before software releases, has attracted a great interest in academia and the software industry alike. Although many classifiers have been proposed, none has been proven superior to others. The major reason is that while a research shows that classifier-A is better than classifier-B, we can find other research coming to a diametrically opposite conclusion. These conflicts are usually triggered when researchers report results using their preferred performance quality measures such as recall and precision. Although this approach is valid, it does not examine all possible facets of classifiers’ performance characteristics. Thus, performance evaluation might improve or deteriorate if researchers choose other performance measures. As a result, software developers usually struggle to select the most suitable classifier to use in their projects. The goal of this dissertation is to apply the Fuzzy Analytical Hierarchy Process (FAHP) as a popular multi-criteria decision-making technique to overcome these inconsistencies in research outcomes. This evaluation framework incorporates a wider spectrum of performance measures to evaluate classifiers’ performance, rather than relying on selected, preferred measures. The results show that this approach will increase software developers’ confidence in research outcomes, help them in avoiding false conclusions and indicate reasonable boundaries for them. We utilized 22 popular performance measures and 11 software defect classifiers. The analysis was carried out using KNIME data mining platform and 12 software defect data sets provided by NASA Metrics Data Program (MDP) repository
    • …
    corecore