1,213 research outputs found

    A Broad Ensemble Learning System for Drifting Stream Classification

    Full text link
    In a data stream environment, classification models must handle concept drift efficiently and effectively. Ensemble methods are widely used for this purpose; however, the ones available in the literature either use a large data chunk to update the model or learn the data one by one. In the former, the model may miss the changes in the data distribution, and in the latter, the model may suffer from inefficiency and instability. To address these issues, we introduce a novel ensemble approach based on the Broad Learning System (BLS), where mini chunks are used at each update. BLS is an effective lightweight neural architecture recently developed for incremental learning. Although it is fast, it requires huge data chunks for effective updates, and is unable to handle dynamic changes observed in data streams. Our proposed approach named Broad Ensemble Learning System (BELS) uses a novel updating method that significantly improves best-in-class model accuracy. It employs an ensemble of output layers to address the limitations of BLS and handle drifts. Our model tracks the changes in the accuracy of the ensemble components and react to these changes. We present the mathematical derivation of BELS, perform comprehensive experiments with 20 datasets that demonstrate the adaptability of our model to various drift types, and provide hyperparameter and ablation analysis of our proposed model. Our experiments show that the proposed approach outperforms nine state-of-the-art baselines and supplies an overall improvement of 13.28% in terms of average prequential accuracy.Comment: Submitted to IEEE Acces

    A survey on online active learning

    Full text link
    Online active learning is a paradigm in machine learning that aims to select the most informative data points to label from a data stream. The problem of minimizing the cost associated with collecting labeled observations has gained a lot of attention in recent years, particularly in real-world applications where data is only available in an unlabeled form. Annotating each observation can be time-consuming and costly, making it difficult to obtain large amounts of labeled data. To overcome this issue, many active learning strategies have been proposed in the last decades, aiming to select the most informative observations for labeling in order to improve the performance of machine learning models. These approaches can be broadly divided into two categories: static pool-based and stream-based active learning. Pool-based active learning involves selecting a subset of observations from a closed pool of unlabeled data, and it has been the focus of many surveys and literature reviews. However, the growing availability of data streams has led to an increase in the number of approaches that focus on online active learning, which involves continuously selecting and labeling observations as they arrive in a stream. This work aims to provide an overview of the most recently proposed approaches for selecting the most informative observations from data streams in the context of online active learning. We review the various techniques that have been proposed and discuss their strengths and limitations, as well as the challenges and opportunities that exist in this area of research. Our review aims to provide a comprehensive and up-to-date overview of the field and to highlight directions for future work

    Incremental Learning on Non-stationary Data Stream using Ensemble Approach

    Get PDF
    Incremental Learning on non stationary distribution has been shown to be a very challenging problem in machine learning and data mining, because the joint probability distribution between the data and classes changes over time. Many real time problems suffer concept drift as they changes with time. For example, an advertisement recommendation system, in which customer’s behavior may change depending on the season of the year, on the inflation and on new products made available. An extra challenge arises when the classes to be learned are not represented equally in the training data i.e. classes are imbalanced, as most machine learning algorithms work well only when the training data  is balanced. The objective of this paper is to develop an ensemble based classification algorithm for non-stationary data stream (ENSDS) with focus on two-class problems. In addition, we are presenting here an exhaustive comparison of purposed algorithms with state-of-the-art classification approaches using different evaluation measures like recall, f-measure and g-mea

    A survey on machine learning for recurring concept drifting data streams

    Get PDF
    The problem of concept drift has gained a lot of attention in recent years. This aspect is key in many domains exhibiting non-stationary as well as cyclic patterns and structural breaks affecting their generative processes. In this survey, we review the relevant literature to deal with regime changes in the behaviour of continuous data streams. The study starts with a general introduction to the field of data stream learning, describing recent works on passive or active mechanisms to adapt or detect concept drifts, frequent challenges in this area, and related performance metrics. Then, different supervised and non-supervised approaches such as online ensembles, meta-learning and model-based clustering that can be used to deal with seasonalities in a data stream are covered. The aim is to point out new research trends and give future research directions on the usage of machine learning techniques for data streams which can help in the event of shifts and recurrences in continuous learning scenarios in near real-time

    An ensemble based on neural networks with random weights for online data stream regression

    Get PDF
    Most information sources in the current technological world are generating data sequentially and rapidly, in the form of data streams. The evolving nature of processes may often cause changes in data distribution, also known as concept drift, which is difficult to detect and causes loss of accuracy in supervised learning algorithms. As a consequence, online machine learning algorithms that are able to update actively according to possible changes in the data distribution are required. Although many strategies have been developed to tackle this problem, most of them are designed for classification problems. Therefore, in the domain of regression problems, there is a need for the development of accurate algorithms with dynamic updating mechanisms that can operate in a computational time compatible with today’s demanding market. In this article, the authors propose a new bagging ensemble approach based on Neural Network with Random Weights for online data stream regression. The proposed method improves the data prediction accuracy as well as minimises the required computational time compared to a recent algorithm for online data stream regression from literature. The experiments are carried out using four synthetic datasets to evaluate the algorithm's response to concept drift, along with four benchmark datasets from different industries. The results indicate improvement in data prediction accuracy, effectiveness in handling concept drift and much faster updating times compared to the existing available approach. Additionally, the use of Design of Experiments as an effective tool for hyperparameter tuning is demonstrated

    Efficient Asymmetric Co-Tracking using Uncertainty Sampling

    Full text link
    Adaptive tracking-by-detection approaches are popular for tracking arbitrary objects. They treat the tracking problem as a classification task and use online learning techniques to update the object model. However, these approaches are heavily invested in the efficiency and effectiveness of their detectors. Evaluating a massive number of samples for each frame (e.g., obtained by a sliding window) forces the detector to trade the accuracy in favor of speed. Furthermore, misclassification of borderline samples in the detector introduce accumulating errors in tracking. In this study, we propose a co-tracking based on the efficient cooperation of two detectors: a rapid adaptive exemplar-based detector and another more sophisticated but slower detector with a long-term memory. The sampling labeling and co-learning of the detectors are conducted by an uncertainty sampling unit, which improves the speed and accuracy of the system. We also introduce a budgeting mechanism which prevents the unbounded growth in the number of examples in the first detector to maintain its rapid response. Experiments demonstrate the efficiency and effectiveness of the proposed tracker against its baselines and its superior performance against state-of-the-art trackers on various benchmark videos.Comment: Submitted to IEEE ICSIPA'201
    corecore