31 research outputs found

    Arhivske vijesti o pučkoj drami u srednjoj Dalmaciji

    Get PDF
    © 2017 Association for Computing Machinery. Context: The research literature on software development projects usually assumes that effort is a good proxy for cost. Practice, however, suggests that there are circumstances in which costs and effort should be distinguished. Objectives: We determine similarities and differences between size, effort, cost, duration, and number of defects of software projects. Method: We compare two established repositories (ISBSG and EBSPM) comprising almost 700 projects from industry. Results: We demonstrate a (log)-linear relation between cost on the one hand, and size, duration and number of defects on the other. This justifies conducting linear regression for cost. We establish that ISBSG is substantially different from EBSPM, in terms of cost (cheaper) and duration (faster), and the relation between cost and effort. We show that while in ISBSG effort is the most important cost factor, this is not the case in other repositories, such as EBSPM in which size is the dominant factor. Conclusion: Practitioners and researchers alike should be cautious when drawing conclusions from a single repository

    confstream: automated algorithm selection and configuration of stream clustering algorithms

    Get PDF
    Machine learning has become one of the most important tools in data analysis. However, selecting the most appropriate machine learning algorithm and tuning its hyperparameters to their optimal values remains a difficult task. This is even more difficult for streaming applications where automated approaches are often not available to help during algorithm selection and configuration. This paper proposes the first approach for automated algorithm selection and configuration of stream clustering algorithms. We train an ensemble of different stream clustering algorithms and configurations in parallel and use the best performing configuration to obtain a clustering solution. By drawing new configurations from better performing ones, we are able to improve the ensemble performance over time. In large experiments on real and artificial data we show how our ensemble approach can improve upon default configurations and can also compete with a-posteriori algorithm configuration. Our approach is considerably faster than a-posteriori approaches and applicable in real-time. In addition, it is not limited to stream clustering and can be generalised to all streaming applications, including stream classification and regression

    Data mining for software engineering and humans in the loop

    Get PDF
    The field of data mining for software engineering has been growing over the last decade. This field is concerned with the use of data mining to provide useful insights into how to improve software engineering processes and software itself, supporting decision-making. For that, data produced by software engineering processes and products during and after software development are used. Despite promising results, there is frequently a lack of discussion on the role of software engineering practitioners amidst the data mining approaches. This makes adoption of data mining by software engineering practitioners difficult. Moreover, the fact that experts’ knowledge is frequently ignored by data mining approaches, together with the lack of transparency of such approaches, can hinder the acceptability of data mining by software engineering practitioners. To overcome these problems, this position paper provides a discussion of the role of software engineering experts when adopting data mining approaches. It also argues that this role can be extended to increase experts’ involvement in the process of building data mining models. We believe that such extended involvement is not only likely to increase software engineers’ acceptability of the resulting models, but also improve the models themselves. We also provide some recommendations aimed at increasing the success of experts involvement and model acceptability

    A Brief Survey on Concept Drift

    No full text

    Multi-Source Transfer Learning for Non-Stationary Environments

    Get PDF
    In data stream mining, predictive models typically suffer drops in predictive performance due to concept drift. As enough data representing the new concept must be collected for the new concept to be well learnt, the predictive performance of existing models usually takes some time to recover from concept drift. To speed up recovery from concept drift and improve predictive performance in data stream mining, this work proposes a novel approach called Multi-sourcE onLine TrAnsfer learning for Non-statIonary Environments (Melanie). Melanie is the first approach able to transfer knowledge between multiple data streaming sources in non-stationary environments. It creates several sub-classifiers to learn different aspects from different source and target concepts over time. The sub-classifiers that match the current target concept well are identified, and used to compose an ensemble for predicting examples from the target concept. We evaluate Melanie on several synthetic data streams containing different types of concept drift and on real world data streams. The results indicate that Melanie can deal with a variety drifts and improve predictive performance over existing data stream learning algorithms by making use of multiple sources
    corecore