1,213 research outputs found
A Broad Ensemble Learning System for Drifting Stream Classification
In a data stream environment, classification models must handle concept drift
efficiently and effectively. Ensemble methods are widely used for this purpose;
however, the ones available in the literature either use a large data chunk to
update the model or learn the data one by one. In the former, the model may
miss the changes in the data distribution, and in the latter, the model may
suffer from inefficiency and instability. To address these issues, we introduce
a novel ensemble approach based on the Broad Learning System (BLS), where mini
chunks are used at each update. BLS is an effective lightweight neural
architecture recently developed for incremental learning. Although it is fast,
it requires huge data chunks for effective updates, and is unable to handle
dynamic changes observed in data streams. Our proposed approach named Broad
Ensemble Learning System (BELS) uses a novel updating method that significantly
improves best-in-class model accuracy. It employs an ensemble of output layers
to address the limitations of BLS and handle drifts. Our model tracks the
changes in the accuracy of the ensemble components and react to these changes.
We present the mathematical derivation of BELS, perform comprehensive
experiments with 20 datasets that demonstrate the adaptability of our model to
various drift types, and provide hyperparameter and ablation analysis of our
proposed model. Our experiments show that the proposed approach outperforms
nine state-of-the-art baselines and supplies an overall improvement of 13.28%
in terms of average prequential accuracy.Comment: Submitted to IEEE Acces
A survey on online active learning
Online active learning is a paradigm in machine learning that aims to select
the most informative data points to label from a data stream. The problem of
minimizing the cost associated with collecting labeled observations has gained
a lot of attention in recent years, particularly in real-world applications
where data is only available in an unlabeled form. Annotating each observation
can be time-consuming and costly, making it difficult to obtain large amounts
of labeled data. To overcome this issue, many active learning strategies have
been proposed in the last decades, aiming to select the most informative
observations for labeling in order to improve the performance of machine
learning models. These approaches can be broadly divided into two categories:
static pool-based and stream-based active learning. Pool-based active learning
involves selecting a subset of observations from a closed pool of unlabeled
data, and it has been the focus of many surveys and literature reviews.
However, the growing availability of data streams has led to an increase in the
number of approaches that focus on online active learning, which involves
continuously selecting and labeling observations as they arrive in a stream.
This work aims to provide an overview of the most recently proposed approaches
for selecting the most informative observations from data streams in the
context of online active learning. We review the various techniques that have
been proposed and discuss their strengths and limitations, as well as the
challenges and opportunities that exist in this area of research. Our review
aims to provide a comprehensive and up-to-date overview of the field and to
highlight directions for future work
Incremental Learning on Non-stationary Data Stream using Ensemble Approach
Incremental Learning on non stationary distribution has been shown to be a very challenging problem in machine learning and data mining, because the joint probability distribution between the data and classes changes over time. Many real time problems suffer concept drift as they changes with time. For example, an advertisement recommendation system, in which customer’s behavior may change depending on the season of the year, on the inflation and on new products made available. An extra challenge arises when the classes to be learned are not represented equally in the training data i.e. classes are imbalanced, as most machine learning algorithms work well only when the training data is balanced. The objective of this paper is to develop an ensemble based classification algorithm for non-stationary data stream (ENSDS) with focus on two-class problems. In addition, we are presenting here an exhaustive comparison of purposed algorithms with state-of-the-art classification approaches using different evaluation measures like recall, f-measure and g-mea
A survey on machine learning for recurring concept drifting data streams
The problem of concept drift has gained a lot of attention in recent years. This aspect is key in many domains exhibiting non-stationary as well as cyclic patterns and structural breaks affecting their generative processes. In this survey, we review the relevant literature to deal with regime changes in the behaviour of continuous data streams. The study starts with a general introduction to the field of data stream learning, describing recent works on passive or active mechanisms to adapt or detect concept drifts, frequent challenges in this area, and related performance metrics. Then, different supervised and non-supervised approaches such as online ensembles, meta-learning and model-based clustering that can be used to deal with seasonalities in a data stream are covered. The aim is to point out new research trends and give future research directions on the usage of machine learning techniques for data streams which can help in the event of shifts and recurrences in continuous learning scenarios in near real-time
An ensemble based on neural networks with random weights for online data stream regression
Most information sources in the current technological world are generating data sequentially and
rapidly, in the form of data streams. The evolving nature of processes may often cause changes in
data distribution, also known as concept drift, which is difficult to detect and causes loss of
accuracy in supervised learning algorithms. As a consequence, online machine learning algorithms
that are able to update actively according to possible changes in the data distribution are required.
Although many strategies have been developed to tackle this problem, most of them are designed
for classification problems. Therefore, in the domain of regression problems, there is a need for the
development of accurate algorithms with dynamic updating mechanisms that can operate in a
computational time compatible with today’s demanding market. In this article, the authors propose
a new bagging ensemble approach based on Neural Network with Random Weights for online data
stream regression. The proposed method improves the data prediction accuracy as well as
minimises the required computational time compared to a recent algorithm for online data stream
regression from literature. The experiments are carried out using four synthetic datasets to evaluate
the algorithm's response to concept drift, along with four benchmark datasets from different
industries. The results indicate improvement in data prediction accuracy, effectiveness in handling
concept drift and much faster updating times compared to the existing available approach.
Additionally, the use of Design of Experiments as an effective tool for hyperparameter tuning is
demonstrated
Efficient Asymmetric Co-Tracking using Uncertainty Sampling
Adaptive tracking-by-detection approaches are popular for tracking arbitrary
objects. They treat the tracking problem as a classification task and use
online learning techniques to update the object model. However, these
approaches are heavily invested in the efficiency and effectiveness of their
detectors. Evaluating a massive number of samples for each frame (e.g.,
obtained by a sliding window) forces the detector to trade the accuracy in
favor of speed. Furthermore, misclassification of borderline samples in the
detector introduce accumulating errors in tracking. In this study, we propose a
co-tracking based on the efficient cooperation of two detectors: a rapid
adaptive exemplar-based detector and another more sophisticated but slower
detector with a long-term memory. The sampling labeling and co-learning of the
detectors are conducted by an uncertainty sampling unit, which improves the
speed and accuracy of the system. We also introduce a budgeting mechanism which
prevents the unbounded growth in the number of examples in the first detector
to maintain its rapid response. Experiments demonstrate the efficiency and
effectiveness of the proposed tracker against its baselines and its superior
performance against state-of-the-art trackers on various benchmark videos.Comment: Submitted to IEEE ICSIPA'201
- …