28,564 research outputs found

    Software defect prediction framework based on hybrid metaheuristic optimization methods

    Get PDF
    A software defect is an error, failure, or fault in a software that produces an incorrect or unexpected result. Software defects are expensive in quality and cost. The accurate prediction of defect‐prone software modules certainly assist testing effort, reduce costs and improve the quality of software. The classification algorithm is a popular machine learning approach for software defect prediction. Unfortunately, software defect prediction remains a largely unsolved problem. As the first problem, the comparison and benchmarking results of the defect prediction using machine learning classifiers indicate that, the poor accuracy level is dominant and no particular classifiers perform best for all the datasets. There are two main problems that affect classification performance in software defect prediction: noisy attributes and imbalanced class distribution of datasets, and difficulty of selecting optimal parameters of the classifiers. In this study, a software defect prediction framework that combines metaheuristic optimization methods for feature selection and parameter optimization, with meta learning methods for solving imbalanced class problem on datasets, which aims to improve the accuracy of classification models has been proposed. The proposed framework and models that are are considered to be the specific research contributions of this thesis are: 1) a comparison framework of classification models for software defect prediction known as CF-SDP, 2) a hybrid genetic algorithm based feature selection and bagging technique for software defect prediction known as GAFS+B, 3) a hybrid particle swarm optimization based feature selection and bagging technique for software defect prediction known as PSOFS+B, and 4) a hybrid genetic algorithm based neural network parameter optimization and bagging technique for software defect prediction, known as NN-GAPO+B. For the purpose of this study, ten classification algorithms have been selected. The selection aims at achieving a balance between established classification algorithms used in software defect prediction. The proposed framework and methods are evaluated using the state-of-the-art datasets from the NASA metric data repository. The results indicated that the proposed methods (GAFS+B, PSOFS+B and NN-GAPO+B) makes an impressive improvement in the performance of software defect prediction. GAFS+B and PSOFS+B significantly affected on the performance of the class imbalance suffered classifiers, such as C4.5 and CART. GAFS+B and PSOFS+B also outperformed the existing software defect prediction frameworks in most datasets. Based on the conducted experiments, logistic regression performs best in most of the NASA MDP datasets, without or with feature selection method. The proposed methods also generated the selected relevant features in software defect prediction. The top ten most relevant features in software defect prediction include branch count metrics, decision density, halstead level metric of a module, number of operands contained in a module, maintenance severity, number of blank LOC, halstead volume, number of unique operands contained in a module, total number of LOC and design density

    Software Fault Prediction and Test Data Generation Using Articial Intelligent Techniques

    Get PDF
    The complexity in requirements of the present-day software, which are often very large in nature has lead to increase in more number of lines of code, resulting in more number of modules. There is every possibility that some of the modules may give rise to varieties of defects, if testing is not done meticulously. In practice, it is not possible to carry out white box testing of every module of any software. Thus, software testing needs to be done selectively for the modules, which are prone to faults. Identifying the probable fault-prone modules is a critical task, carried out for any software. This dissertation, emphasizes on design of prediction and classication models to detect fault prone classes for object-oriented programs. Then, test data are generated for a particular task to check the functionality of the software product. In the eld of object-oriented software engineering, it is observed that Chidamber and Kemerer (CK) software metrics suite is more frequently used for fault prediction analysis, as it covers the unique aspects of object - oriented programming such as the complexity, data abstraction, and inheritance. It is observed that one of the most important goals of fault prediction is to detect fault prone modules as early as possible in the software development life cycle (SDLC). Numerous authors have used design and code metrics for predicting fault-prone modules. In this work, design metrics are used for fault prediction. In order to carry out fault prediction analysis, prediction models are designed using machine learning methods. Machine learning methods such as Statistical methods, Articial neural network, Radial basis function network, Functional link articial neural network, and Probabilistic neural network are deployed for fault prediction analysis. In the rst phase, fault prediction is performed using the CK metrics suite. In the next phase, the reduced feature sets of CK metrics suite obtained by applying principal component analysis and rough set theory are used to perform fault prediction. A comparative approach is drawn to nd a suitable prediction model among the set of designed models for fault prediction. Prediction models designed for fault proneness, need to be validated for their eciency. To achieve this, a cost-based evaluation framework is designed to evaluate the eectiveness of the designed fault prediction models. This framework, is based on the classication of classes as faulty or not-faulty. In this cost-based analysis, it is observed that fault prediction is found to be suitable where normalized estimated fault removal cost (NEcost) is less than certain threshold value. Also this indicated that any prediction model having NEcost greater than the threshold value are not suitable for fault prediction, and then further these classes are unit tested. All the prediction and classier models used in the fault prediction analysis are applied on a case study viz., Apache Integration Framework (AIF). The metric data values are obtained from PROMISE repository and are mined using Chidamber and Kemerer Java Metrics (CKJM) tool. Test data are generated for object-oriented program for withdrawal task in Bank ATM using three meta-heuristic search algorithms such as Clonal selection algorithm, Binary particle swarm optimization, and Articial bee colony algorithm. It is observed that Articial bee colony algorithm is able to obtain near optimal test data when compared to the other two algorithms. The test data are generated for withdrawal task based on the tness function derived by using the branch distance proposed by Bogdan Korel. The generated test data ensure the proper functionality or the correctness of the programmed module in a software

    Too Trivial To Test? An Inverse View on Defect Prediction to Identify Methods with Low Fault Risk

    Get PDF
    Background. Test resources are usually limited and therefore it is often not possible to completely test an application before a release. To cope with the problem of scarce resources, development teams can apply defect prediction to identify fault-prone code regions. However, defect prediction tends to low precision in cross-project prediction scenarios. Aims. We take an inverse view on defect prediction and aim to identify methods that can be deferred when testing because they contain hardly any faults due to their code being "trivial". We expect that characteristics of such methods might be project-independent, so that our approach could improve cross-project predictions. Method. We compute code metrics and apply association rule mining to create rules for identifying methods with low fault risk. We conduct an empirical study to assess our approach with six Java open-source projects containing precise fault data at the method level. Results. Our results show that inverse defect prediction can identify approx. 32-44% of the methods of a project to have a low fault risk; on average, they are about six times less likely to contain a fault than other methods. In cross-project predictions with larger, more diversified training sets, identified methods are even eleven times less likely to contain a fault. Conclusions. Inverse defect prediction supports the efficient allocation of test resources by identifying methods that can be treated with less priority in testing activities and is well applicable in cross-project prediction scenarios.Comment: Submitted to PeerJ C

    Connecting Software Metrics across Versions to Predict Defects

    Full text link
    Accurate software defect prediction could help software practitioners allocate test resources to defect-prone modules effectively and efficiently. In the last decades, much effort has been devoted to build accurate defect prediction models, including developing quality defect predictors and modeling techniques. However, current widely used defect predictors such as code metrics and process metrics could not well describe how software modules change over the project evolution, which we believe is important for defect prediction. In order to deal with this problem, in this paper, we propose to use the Historical Version Sequence of Metrics (HVSM) in continuous software versions as defect predictors. Furthermore, we leverage Recurrent Neural Network (RNN), a popular modeling technique, to take HVSM as the input to build software prediction models. The experimental results show that, in most cases, the proposed HVSM-based RNN model has a significantly better effort-aware ranking effectiveness than the commonly used baseline models

    Are Smell-Based Metrics Actually Useful in Effort-Aware Structural Change-Proneness Prediction? An Empirical Study

    Get PDF
    Bad code smells (also named as code smells) are symptoms of poor design choices in implementation. Existing studies empirically confirmed that the presence of code smells increases the likelihood of subsequent changes (i.e., change-proness). However, to the best of our knowledge, no prior studies have leveraged smell-based metrics to predict particular change type (i.e., structural changes). Moreover, when evaluating the effectiveness of smell-based metrics in structural change-proneness prediction, none of existing studies take into account of the effort inspecting those change-prone source code. In this paper, we consider five smell-based metrics for effort-aware structural change-proneness prediction and compare these metrics with a baseline of well-known CK metrics in predicting particular categories of change types. Specifically, we first employ univariate logistic regression to analyze the correlation between each smellbased metric and structural change-proneness. Then, we build multivariate prediction models to examine the effectiveness of smell-based metrics in effort-aware structural change-proneness prediction when used alone and used together with the baseline metrics, respectively. Our experiments are conducted on six Java open-source projects with up to 60 versions and results indicate that: (1) all smell-based metrics are significantly related to structural change-proneness, except metric ANS in hive and SCM in camel after removing confounding effect of file size; (2) in most cases, smell-based metrics outperform the baseline metrics in predicting structural change-proneness; and (3) when used together with the baseline metrics, the smell-based metrics are more effective to predict change-prone files with being aware of inspection effort
    corecore