65,850 research outputs found

    Is "Better Data" Better than "Better Data Miners"? (On the Benefits of Tuning SMOTE for Defect Prediction)

    Full text link
    We report and fix an important systematic error in prior studies that ranked classifiers for software analytics. Those studies did not (a) assess classifiers on multiple criteria and they did not (b) study how variations in the data affect the results. Hence, this paper applies (a) multi-criteria tests while (b) fixing the weaker regions of the training data (using SMOTUNED, which is a self-tuning version of SMOTE). This approach leads to dramatically large increases in software defect predictions. When applied in a 5*5 cross-validation study for 3,681 JAVA classes (containing over a million lines of code) from open source systems, SMOTUNED increased AUC and recall by 60% and 20% respectively. These improvements are independent of the classifier used to predict for quality. Same kind of pattern (improvement) was observed when a comparative analysis of SMOTE and SMOTUNED was done against the most recent class imbalance technique. In conclusion, for software analytic tasks like defect prediction, (1) data pre-processing can be more important than classifier choice, (2) ranking studies are incomplete without such pre-processing, and (3) SMOTUNED is a promising candidate for pre-processing.Comment: 10 pages + 2 references. Accepted to International Conference of Software Engineering (ICSE), 201

    Workflow resource pattern modelling and visualization

    Get PDF
    Workflow patterns have been recognized as the theoretical basis to modeling recurring problems in workflow systems. A form of workflow patterns, known as the resource patterns, characterise the behaviour of resources in workflow systems. Despite the fact that many resource patterns have been discovered, people still preclude them from many workflow system implementations. One of reasons could be obscurityin the behaviour of and interaction between resources and a workflow management system. Thus, we provide a modelling and visualization approach for the resource patterns, enabling a resource behaviour modeller to intuitively see the specific resource patterns involved in the lifecycle of a workitem. We believe this research can be extended to benefit not only workflow modelling, but also other applications, such as model validation, human resource behaviour modelling, and workflow model visualization

    A review of the state of the art in Machine Learning on the Semantic Web: Technical Report CSTR-05-003

    Get PDF

    A Unified Checklist for Observational and Experimental Research in Software Engineering (Version 1)

    Get PDF
    Current checklists for empirical software engineering cover either experimental research or case study research but ignore the many commonalities that exist across all kinds of empirical research. Identifying these commonalities, and explaining why they exist, would enhance our understanding of empirical research in general and of the differences between experimental and case study research in particular. In this report we design a unified checklist for empirical research, and identify commonalities and differences between experimental and case study research. We design the unified checklist as a specialization of the general engineering cycle, which itself is a special case of the rational choice cycle. We then compare the resulting empirical research cycle with two checklists for experimental research, and with one checklist for case study research. The resulting checklist identifies important questions to be answered in experimental and case study research design and reports. The checklist provides insights in two different types of empirical research design and their relationships. Its limitations are that it ignores other research methods such as meta-research or surveys. It has been tested so far only in our own research designs and in teaching empirical methods. Future work includes expanding the comparison with other methods and application in more cases, by others than ourselves
    corecore