12 research outputs found

    Data quality: Some comments on the NASA software defect datasets

    Get PDF
    Background-Self-evidently empirical analyses rely upon the quality of their data. Likewise, replications rely upon accurate reporting and using the same rather than similar versions of datasets. In recent years, there has been much interest in using machine learners to classify software modules into defect-prone and not defect-prone categories. The publicly available NASA datasets have been extensively used as part of this research. Objective-This short note investigates the extent to which published analyses based on the NASA defect datasets are meaningful and comparable. Method-We analyze the five studies published in the IEEE Transactions on Software Engineering since 2007 that have utilized these datasets and compare the two versions of the datasets currently in use. Results-We find important differences between the two versions of the datasets, implausible values in one dataset and generally insufficient detail documented on dataset preprocessing. Conclusions-It is recommended that researchers 1) indicate the provenance of the datasets they use, 2) report any preprocessing in sufficient detail to enable meaningful replication, and 3) invest effort in understanding the data prior to applying machine learners

    Analysis of Data mining based Software Defect Prediction Techniques

    Get PDF
    Software bug repository is the main resource for fault prone modules. Different data mining algorithms are used to extract fault prone modules from these repositories. Software development team tries to increase the software quality by decreasing the number of defects as much as possible. In this paper different data mining techniques are discussed for identifying fault prone modules as well as compare the data mining algorithms to find out the best algorithm for defect prediction

    Software Defect Prediction Based on Classication Rule Mining

    Get PDF
    There has been rapid growth of software development. Due to various causes, the software comes with many defects. In Software development process, testing of software is the main phase which reduces the defects of the software. If a developer or a tester can predict the software defects properly then, it reduces the cost, time and eort. In this paper, we show a comparative analysis of software defect prediction based on classifcation rule mining. We propose a scheme for this process and we choose different classication algorithms. Showing the comparison of predictions in software defects analysis. This evaluation analyzes the prediction performance of competing learning schemes for given historical data sets(NASA MDP Data Set). The result of this scheme evaluation shows that we have to choose different classifer rule for different data set

    Fuzzy Logic Based Software Reliability Quantification Framework: Early Stage Perspective (FLSRQF)

    Get PDF
    AbstractToday, the influence of information technology has been spreading exponentially, from high level research going on in top labs of the world to the home appliances. Such a huge demand is compelling developers to develop more software to meet the user expectations. As a result reliability has come up as a critical quality factor that cannot be compromised. Therefore, researchers are continuously making efforts to meet this challenge. With this spirit, authors of the paper have proposed a highly structured framework that guides the process of quantifying software reliability, before the coding of the software start. Before presenting the framework, to realize its need and significance, the paper has presented the state-of-the-art on software reliability quantification. The strength of fuzzy set theory has been utilized to prevail over the limitation of subjectivity of requirements stage measures. Salient features of the framework are also highlighted at the end of the paper

    Evaluating the effectiveness of data quality framework in software engineering

    Get PDF
    The quality of data is important in research working with data sets because poor data quality may lead to invalid results. Data sets contain measurements that are associated with metrics and entities; however, in some data sets, it is not always clear which entities have been measured and exactly which metrics have been used. This means that measurements could be misinterpreted. In this study, we develop a framework for data quality assessment that determines whether a data set has sufficient information to support the correct interpretation of data for analysis in empirical research. The framework incorporates a dataset metamodel and a quality assessment process to evaluate the data set quality. To evaluate the effectiveness of our framework, we conducted a user study. We used observations, a questionnaire and think aloud approach to provide insights into the framework through participant thought processes while applying the framework. The results of our study provide evidence that most participants successfully applied the definitions of dataset category elements and the formal definitions of data quality issues to the datasets. Further work is needed to reproduce our results with more participants, and to determine whether the data quality framework is generalizable to other types of data sets
    corecore