7 research outputs found

    Concept drift for big data

    No full text
    The term “concept drift” refers to a change in statistical distribution of the data. In machine learning and predictive analysis, a fundamental assumption exits which reasons that the data is a random variable which is being generated independently from an underlying stationary distribution. In this chapter we present discussions on concept drifts that are inherent in the context big data. We discuss different forms of concept drifts that are evident in streaming data and outline different techniques for handling them. Handling concept drift is important for big data where the data flow occurs continuously causing existing learned models to lose their predictive accuracy. This chapter will serve as a reference to academicians and industry practitioners who are interested in the niche area of handling concept drift for big data applications

    Classification of multi-class imbalanced data streams using a dynamic data-balancing technique

    No full text
    The performance of classification algorithms with imbalanced streaming data depends upon efficient re-balancing strategy for learning tasks. The difficulty becomes more elevated with multi-class highly imbalanced streaming data. In this paper, we investigate the multi-class imbalance problem in data streams and develop an adaptive framework to cope with imbalanced data scenarios. The proposed One-Vs-All Adaptive Window re-Balancing with Retain Knowledge (OVA-AWBReK) classification framework will combine OVA binarization with Automated Re-balancing Strategy (ARS) using Racing Algorithm (RA). We conducted experiments on highly imbalanced datasets to demonstrate the use of the proposed OVA-AWBReK framework. The results show that OVA-AWBReK framework can enhance the classification performance of the multi-class highly imbalanced data

    Prototype-based classifiers in the presence of concept drift: A modelling framework

    Get PDF
    We present a modelling framework for the investigation of prototype-based classifiers in non-stationary environments. Specifically, we study Learning Vector Quantization (LVQ) systems trained from a stream of high-dimensional, clustered data.We consider standard winner-takes-all updates known as LVQ1. Statistical properties of the input data change on the time scale defined by the training process. We apply analytical methods borrowed from statistical physics which have been used earlier for the exact description of learning in stationary environments. The suggested framework facilitates the computation of learning curves in the presence of virtual and real concept drift. Here we focus on timedependent class bias in the training data. First results demonstrate that, while basic LVQ algorithms are suitable for the training in non-stationary environments, weight decay as an explicit mechanism of forgetting does not improve the performance under the considered drift processes
    corecore