14,392 research outputs found

    A Hybrid Multi-Filter Wrapper Feature Selection Method for Software Defect Predictors

    Get PDF
    Software Defect Prediction (SDP) is an approach used for identifying defect-prone software modules or components. It helps software engineer to optimally, allocate limited resources to defective software modules or components in the testing or maintenance phases of software development life cycle (SDLC). Nonetheless, the predictive performance of SDP models reckons largely on the quality of dataset utilized for training the predictive models. The high dimensionality of software metric features has been noted as a data quality problem which negatively affects the predictive performance of SDP models. Feature Selection (FS) is a well-known method for solving high dimensionality problem and can be divided into filter-based and wrapper-based methods. Filter-based FS has low computational cost, but the predictive performance of its classification algorithm on the filtered data cannot be guaranteed. On the contrary, wrapper-based FS have good predictive performance but with high computational cost and lack of generalizability. Therefore, this study proposes a hybrid multi-filter wrapper method for feature selection of relevant and irredundant features in software defect prediction. The proposed hybrid feature selection will be developed to take advantage of filter-filter and filter-wrapper relationships to give optimal feature subsets, reduce its evaluation cycle and subsequently improve SDP models overall predictive performance in terms of Accuracy, Precision and Recall values

    Choosing software metrics for defect prediction: an investigation on feature selection techniques

    Full text link
    The selection of software metrics for building software quality prediction models is a search-based software engineering problem. An exhaustive search for such metrics is usually not feasible due to limited project resources, especially if the number of available metrics is large. Defect prediction models are necessary in aiding project managers for better utilizing valuable project resources for software quality improvement. The efficacy and usefulness of a fault-proneness prediction model is only as good as the quality of the software measurement data. This study focuses on the problem of attribute selection in the context of software quality estimation. A comparative investigation is presented for evaluating our proposed hybrid attribute selection approach, in which feature ranking is first used to reduce the search space, followed by a feature subset selection. A total of seven different feature ranking techniques are evaluated, while four different feature subset selection approaches are considered. The models are trained using five commonly used classification algorithms. The case study is based on software metrics and defect data collected from multiple releases of a large real-world software system. The results demonstrate that while some feature ranking techniques performed similarly, the automatic hybrid search algorithm performed the best among the feature subset selection methods. Moreover, performances of the defect prediction models either improved or remained unchanged when over 85were eliminated. Copyright © 2011 John Wiley & Sons, Ltd.Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/83475/1/1043_ftp.pd

    Software defect prediction using maximal information coefficient and fast correlation-based filter feature selection

    Get PDF
    Software quality ensures that applications that are developed are failure free. Some modern systems are intricate, due to the complexity of their information processes. Software fault prediction is an important quality assurance activity, since it is a mechanism that correctly predicts the defect proneness of modules and classifies modules that saves resources, time and developers’ efforts. In this study, a model that selects relevant features that can be used in defect prediction was proposed. The literature was reviewed and it revealed that process metrics are better predictors of defects in version systems and are based on historic source code over time. These metrics are extracted from the source-code module and include, for example, the number of additions and deletions from the source code, the number of distinct committers and the number of modified lines. In this research, defect prediction was conducted using open source software (OSS) of software product line(s) (SPL), hence process metrics were chosen. Data sets that are used in defect prediction may contain non-significant and redundant attributes that may affect the accuracy of machine-learning algorithms. In order to improve the prediction accuracy of classification models, features that are significant in the defect prediction process are utilised. In machine learning, feature selection techniques are applied in the identification of the relevant data. Feature selection is a pre-processing step that helps to reduce the dimensionality of data in machine learning. Feature selection techniques include information theoretic methods that are based on the entropy concept. This study experimented the efficiency of the feature selection techniques. It was realised that software defect prediction using significant attributes improves the prediction accuracy. A novel MICFastCR model, which is based on the Maximal Information Coefficient (MIC) was developed to select significant attributes and Fast Correlation Based Filter (FCBF) to eliminate redundant attributes. Machine learning algorithms were then run to predict software defects. The MICFastCR achieved the highest prediction accuracy as reported by various performance measures.School of ComputingPh. D. (Computer Science

    DBBRBF- Convalesce optimization for software defect prediction problem using hybrid distribution base balance instance selection and radial basis Function classifier

    Full text link
    Software is becoming an indigenous part of human life with the rapid development of software engineering, demands the software to be most reliable. The reliability check can be done by efficient software testing methods using historical software prediction data for development of a quality software system. Machine Learning plays a vital role in optimizing the prediction of defect-prone modules in real life software for its effectiveness. The software defect prediction data has class imbalance problem with a low ratio of defective class to non-defective class, urges an efficient machine learning classification technique which otherwise degrades the performance of the classification. To alleviate this problem, this paper introduces a novel hybrid instance-based classification by combining distribution base balance based instance selection and radial basis function neural network classifier model (DBBRBF) to obtain the best prediction in comparison to the existing research. Class imbalanced data sets of NASA, Promise and Softlab were used for the experimental analysis. The experimental results in terms of Accuracy, F-measure, AUC, Recall, Precision, and Balance show the effectiveness of the proposed approach. Finally, Statistical significance tests are carried out to understand the suitability of the proposed model.Comment: 32 pages, 24 Tables, 8 Figures

    Mining Data from Multiple Software Development Projects

    Get PDF
    A large system often goes through multiple software project development cycles, in part due to changes in operation and development environments. For example, rapid turnover of the development team between releases can influence software quality, making it important to mine software project data over multiple system releases when building defect predictors. Data collection of software attributes are often conducted independent of the quality improvement goals, leading to the availability of a large number of attributes for analysis. Given the problems associated with variations in development process, data collection, and quality goals from one release to another emphasizes the importance of selecting a best-set of software attributes for software quality prediction. Moreover, it is intuitive to remove attributes that do not add to, or have an adverse effect on, the knowledge of the consequent model. Based on real-world software projects’ data, we present a large case study that compares wrapper-based feature ranking techniques (WRT) and our proposed hybrid feature selection (HFS) technique. The comparison is done using both threefold cross-validation (CV) and three-fold cross-validation with risk impact (CVR). It is shown that HFS is better than WRT, while CV is superior to CVR
    • …
    corecore