8 research outputs found

    Software Fault Prediction Using Filtering Feature Selection in Cluster-Based Classification

    Get PDF
    The high accuracy of software fault prediction can help testing effort and improving software quality. Previous researchers had proposed the combination of Entropy-Based Discretization (EBD) and Cluster-Based Classification (CBC). However, the irrelevant and redundant features in software fault dataset tend to decrease the prediction accuracy value. This study proposes improvement of CBC outcomes by integrating filtering feature selection methods. Filtering feature selection methods that will be integrated with CBC i.e. Information Gain (IG), Gain Ratio (GR), and One-R (OR). Based on the research using 2 datasets NASA public MDP (i.e. PC2 and PC3), the result shows that the combination of CBC and IG yields the best average accuracy value compared to GR and OR. It generates 67.52% average of probability detection (pd) and 37.42% average of probability false alarm (pf). While CBC without feature selection yields 65.38% average pd and 49.95% average pf. It can be concluded that IG can improve CBC outcomes by increasing 2.14% average pd and reducing 12.53% average p

    Predicting Faultiness of Program Modules Using Mamdani Model by Fuzzy Profile Development of Software Metrics

    Get PDF
    This research seminar proposed and implemented a new approach toward reliability and quality measurement of software systems by building a fault prediction model and faultiness degree estimation before starting the testing phase. The main goals of this model were to support decision making with regard to testing phase which leads to reduce the testing efforts, and to optimally assign the needed resources for testing activities. This research used KC2 dataset originated from National Aeronautics and Space Administration (NASA) project to evaluate the predictive accuracy of the proposed model. Software metrics in this dataset are of fuzzy nature, consequently, this work used MATLAB system to build a Mamdani fuzzy inference model. Then, this research applied and validated a published methodology for fuzzy profile development from data as an important requirement to build the model. Moreover, the proposed model utilized the capabilities of k-mean clustering algorithm as a machine learning technique to extract the fuzzy inference rules that were also required to build the model. Finally, this paper used suitable approaches to validate and evaluate the model. Accordingly, the results show that the proposed model provides significant capabilities in fault prediction and estimation

    A simplified predictive framework for cost evaluation to fault assessment using machine learning

    Get PDF
    Software engineering is an integral part of any software development scheme which frequently encounters bugs, errors, and faults. Predictive evaluation of software fault contributes towards mitigating this challenge to a large extent; however, there is no benchmarked framework being reported in this case yet. Therefore, this paper introduces a computational framework of the cost evaluation method to facilitate a better form of predictive assessment of software faults. Based on lines of code, the proposed scheme deploys adopts a machine-learning approach to address the perform predictive analysis of faults. The proposed scheme presents an analytical framework of the correlation-based cost model integrated with multiple standards machine learning (ML) models, e.g., linear regression, support vector regression, and artificial neural networks (ANN). These learning models are executed and trained to predict software faults with higher accuracy. The study considers assessing the outcomes based on error-based performance metrics in detail to determine how well each learning model performs and how accurate it is at learning. It also looked at the factors contributing to the training loss of neural networks. The validation result demonstrates that, compared to logistic regression and support vector regression, neural network achieves a significantly lower error score for software fault prediction

    Fault Driven Supervised Tie Breaking for Test Case Prioritization

    Get PDF
    Regression test suites are an excellent tool to validate the existing functionality of an application during the development process. However, they can be large and time consuming to execute, thus making them inefficient in finding faults. Test Case Prioritization is an area of study that looks to improve the fault detection rates of these test suites by re-ordering execution sequence of the test cases. It attempts to execute the test cases that have the highest probability of detecting faults first. Most prioritization techniques base their decisions on the coverage information gathered from running the test cases. These coverage-based techniques however have a high probability of encountering ties in coverages between two or more test cases. Most studies employ a random selection to break these ties despite it being considered a lower bound method. This thesis designs and develops a framework to supervise the tie breaking in coverage based Test Case Prioritization using fault predictor models. Fault predictor models can assist in identifying the modules in the application that are most prone to containing faults. By selecting test cases that cover modules most prone to faults, the fault detection rate of the test cases can be improved. A fault prediction framework is also introduced in this thesis that supervises the tie breaking for coverage-based techniques. The framework employs an ensemble learner that aggregates results from multiple predictors. To date, no single predictor has been found that can perform consistently on all datasets. Numerous predictors have also required expert knowledge to make them performant. An ensemble learner is a reliable technique to mitigate the problems and bias faced by single predictors and disregard results from poorly performing predictors. In order to evaluate the supervised tie breaking, empirical studies were conducted on two large scale applications, Cassandra and Tomcat. As part of the evaluation, real faults that existed in the application during development were used instead of hand seeded faults or mutation faults as used by many other studies. The data used for fault prediction were also not groomed or marked by experts, unlike other studies. Results from the studies showed significant improvements in the fault detection rates for both case studies when using the fault driven supervision for tie breaking
    corecore