39 research outputs found

    Software defect prediction based on association rule classification.

    Get PDF
    In software defect prediction, predictive models are estimated based on various code attributes to assess the likelihood of software modules containing errors. Many classification methods have been suggested to accomplish this task. However, association based classification methods have not been investigated so far in this context. This paper assesses the use of such a classification method, CBA2, and compares it to other rule based classification methods. Furthermore, we investigate whether rule sets generated on data from one software project can be used to predict defective software modules in other, similar software projects. It is found that applying the CBA2 algorithm results in both accurate and comprehensible rule sets.Software defect prediction; Association rule classification; CBA2; AUC;

    Predicting Fault-prone Software Module Using Data Mining Technique and Fuzzy Logic

    Get PDF
    This paper discusses a new model towards reliability and quality improvement of software systems by predicting fault-prone module before testing. Model utilizes the classification capability of data mining techniques and knowledge stored in software metrics to classify the software module as fault-prone or not fault-prone. A decision tree is constructed using ID3 algorithm for existing project data in order to gain information for the purpose of decision making whether a particular module id fault-prone or not. The gained information is converted into fuzzy rules and integrated with fuzzy inference system to predict fault-prone or not fault-prone software module for target data. The model is also able to predict fault-proneness degree of faulty module. The goal is to help software manager to concentrate their testing efforts to fault-prone modules in order to improve the reliability and quality of the software system. We used NASA projects data set from the PROMOSE repository to validate the predictive accuracy of the model

    Is "Better Data" Better than "Better Data Miners"? (On the Benefits of Tuning SMOTE for Defect Prediction)

    Full text link
    We report and fix an important systematic error in prior studies that ranked classifiers for software analytics. Those studies did not (a) assess classifiers on multiple criteria and they did not (b) study how variations in the data affect the results. Hence, this paper applies (a) multi-criteria tests while (b) fixing the weaker regions of the training data (using SMOTUNED, which is a self-tuning version of SMOTE). This approach leads to dramatically large increases in software defect predictions. When applied in a 5*5 cross-validation study for 3,681 JAVA classes (containing over a million lines of code) from open source systems, SMOTUNED increased AUC and recall by 60% and 20% respectively. These improvements are independent of the classifier used to predict for quality. Same kind of pattern (improvement) was observed when a comparative analysis of SMOTE and SMOTUNED was done against the most recent class imbalance technique. In conclusion, for software analytic tasks like defect prediction, (1) data pre-processing can be more important than classifier choice, (2) ranking studies are incomplete without such pre-processing, and (3) SMOTUNED is a promising candidate for pre-processing.Comment: 10 pages + 2 references. Accepted to International Conference of Software Engineering (ICSE), 201

    Less is more: Temporal fault predictive performance over multiple Hadoop releases

    Get PDF
    We investigate search based fault prediction over time based on 8 consecutive Hadoop versions, aiming to analyse the impact of chronology on fault prediction performance. Our results confound the assumption, implicit in previous work, that additional information from historical versions improves prediction; though G-mean tends to improve, Recall can be reduced

    Is "Better Data" Better than "Better Data Miners"? (On the Benefits of Tuning SMOTE for Defect Prediction)

    Full text link
    We report and fix an important systematic error in prior studies that ranked classifiers for software analytics. Those studies did not (a) assess classifiers on multiple criteria and they did not (b) study how variations in the data affect the results. Hence, this paper applies (a) multi-criteria tests while (b) fixing the weaker regions of the training data (using SMOTUNED, which is a self-tuning version of SMOTE). This approach leads to dramatically large increases in software defect predictions. When applied in a 5*5 cross-validation study for 3,681 JAVA classes (containing over a million lines of code) from open source systems, SMOTUNED increased AUC and recall by 60% and 20% respectively. These improvements are independent of the classifier used to predict for quality. Same kind of pattern (improvement) was observed when a comparative analysis of SMOTE and SMOTUNED was done against the most recent class imbalance technique. In conclusion, for software analytic tasks like defect prediction, (1) data pre-processing can be more important than classifier choice, (2) ranking studies are incomplete without such pre-processing, and (3) SMOTUNED is a promising candidate for pre-processing.Comment: 10 pages + 2 references. Accepted to International Conference of Software Engineering (ICSE), 201

    Software Defect Prediction Based on Classication Rule Mining

    Get PDF
    There has been rapid growth of software development. Due to various causes, the software comes with many defects. In Software development process, testing of software is the main phase which reduces the defects of the software. If a developer or a tester can predict the software defects properly then, it reduces the cost, time and eort. In this paper, we show a comparative analysis of software defect prediction based on classifcation rule mining. We propose a scheme for this process and we choose different classication algorithms. Showing the comparison of predictions in software defects analysis. This evaluation analyzes the prediction performance of competing learning schemes for given historical data sets(NASA MDP Data Set). The result of this scheme evaluation shows that we have to choose different classifer rule for different data set

    Predicting Test Case Verdicts Using TextualAnalysis of Commited Code Churns

    Get PDF
    Background: Continuous Integration (CI) is an agile software development practice that involves producing several clean builds of the software per day. The creation of these builds involve running excessive executions of automated tests, which is hampered by high hardware cost and reduced development velocity. Goal: The goal of our research is to develop a method that reduces the number of executed test cases at each CI cycle.Method: We adopt a design research approach with an infrastructure provider company to develop a method that exploits Ma-chine Learning (ML) to predict test case verdicts for committed sourcecode. We train five different ML models on two data sets and evaluate their performance using two simple retrieval measures: precision and recall. Results: While the results from training the ML models on the first data-set of test executions revealed low performance, the curated data-set for training showed an improvement on performance with respect to precision and recall. Conclusion: Our results indicate that the method is applicable when training the ML model on churns of small size

    Adaptive Genetic Algorithm Based Artificial Neural Network for Software Defect Prediction

    Get PDF
    To meet the requirement of an efficient software defect prediction,in this paper an evolutionary computing based neural network learning scheme has been developed that alleviates the existing Artificial Neural Network (ANN) limitations such as local minima and convergence issues. To achieve optimal software defect prediction, in this paper, Adaptive-Genetic Algorithm (A-GA) based ANN learning and weightestimation scheme has been developed. Unlike conventional GA, in this paper we have used adaptive crossover and mutation probability parameter that alleviates the issue of disruption towards optimal solution. We have used object oriented software metrics, CK metrics for fault prediction and the proposed Evolutionary Computing Based Hybrid Neural Network (HENN)algorithm has been examined for performance in terms of accuracy, precision, recall, F-measure, completeness etc, where it has performed better as compared to major existing schemes. The proposed scheme exhibited 97.99% prediction accuracy while ensuring optimal precision, Fmeasure and recall
    corecore