20 research outputs found

    Mining developer communication data streams

    Full text link
    This paper explores the concepts of modelling a software development project as a process that results in the creation of a continuous stream of data. In terms of the Jazz repository used in this research, one aspect of that stream of data would be developer communication. Such data can be used to create an evolving social network characterized by a range of metrics. This paper presents the application of data stream mining techniques to identify the most useful metrics for predicting build outcomes. Results are presented from applying the Hoeffding Tree classification method used in conjunction with the Adaptive Sliding Window (ADWIN) method for detecting concept drift. The results indicate that only a small number of the available metrics considered have any significance for predicting the outcome of a build

    Measuring Defect Datasets Sensitivity to Attributes Variation

    Get PDF
    The study of the correlation between software project and product attributes and its modules quality status (faulty or not) is the subject of several research papers in the software testing and maintenance fields. In this paper, a tool is built to change the values of software data sets\u27 attributes and study the impact of this change on the modules\u27 defect status. The goal is to find those specific attributes that highly correlate with the module defect attribute. An algorithm is developed to automatically predict the module defect status based on the values of the module attributes and based on their change from reference or initial values. For each attribute of those software projects, results can show when such attribute can be, if any, a major player in deciding the defect status of the project or a specific module. Results showed consistent, and in some cases better, results in comparison with most surveyed defect prediction algorithms. Results showed also that this can be a very powerful method to understand each attribute individual impact, if any, to the module quality status and how it can be improved

    Enhance Rule Based Detection for Software Fault Prone Modules

    Get PDF
    Software quality assurance is necessary to increase the level of confidence in the developed software and reduce the overall cost for developing software projects. The problem addressed in this research is the prediction of fault prone modules using data mining techniques. Predicting fault prone modules allows the software managers to allocate more testing and resources to such modules. This can also imply a good investment in better design in future systems to avoid building error prone modules. Software quality models that are based upon data mining from previous projects can identify fault-prone modules in the current similar development project, once similarity between projects is established. In this paper, we applied different data mining rule-based classification techniques on several publicly available datasets of the NASA software repository (e.g. PC1, PC2, etc). The goal was to classify the software modules into either fault prone or not fault prone modules. The paper proposed a modification on the RIDOR algorithm on which the results show that the enhanced RIDOR algorithm is better than other classification techniques in terms of the number of extracted rules and accuracy. The implemented algorithm learns defect prediction using mining static code attributes. Those attributes are then used to present a new defect predictor with high accuracy and low error rate

    A Review of Metrics and Modeling Techniques in Software Fault Prediction Model Development

    Get PDF
    This paper surveys different software fault predictions progressed through different data analytic techniques reported in the software engineering literature. This study split in three broad areas; (a) The description of software metrics suites reported and validated in the literature. (b) A brief outline of previous research published in the development of software fault prediction model based on various analytic techniques. This utilizes the taxonomy of analytic techniques while summarizing published research. (c) A review of the advantages of using the combination of metrics. Though, this area is comparatively new and needs more research efforts

    Use Data Mining Cleansing to Prepare Data for Strategic Decisions

    Get PDF
    Pre-processing data on the dataset is often neglected, but it is an important step in the data mining process. Analyzing data that has not been carefully screened for such challenges can produce misleading results. Thus, the representation and quality of data are first and foremost before running an analysis. In this paper, the sources of data collection to remove errors are identified and presented. The data mining cleaning and its methods are discussed. Data preparation has become a ubiquitous function of production organizations – for record-keeping and strategical making in supporting various data analysis tasks critical to the organizational mission. Despite the importance of data collection, data quality remains a pervasive and thorny challenge in almost any production organization. The presence of incorrect or inconsistent data can significantly distort the results of analyses, often negating the potential benefits of strategical making driven approaches. This tool has removed and eliminated errors, duplications, and inconsistent records on the datasets
    corecore