10,109 research outputs found
Boosting Classifiers for Drifting Concepts
This paper proposes a boosting-like method to train a classifier ensemble from data streams. It naturally adapts to concept drift and allows to quantify the drift in terms of its base learners. The algorithm is empirically shown to outperform learning algorithms that ignore concept drift. It performs no worse than advanced adaptive time window and example selection strategies that store all the data and are thus not suited for mining massive streams. --
Learning with a Drifting Target Concept
We study the problem of learning in the presence of a drifting target
concept. Specifically, we provide bounds on the error rate at a given time,
given a learner with access to a history of independent samples labeled
according to a target concept that can change on each round. One of our main
contributions is a refinement of the best previous results for polynomial-time
algorithms for the space of linear separators under a uniform distribution. We
also provide general results for an algorithm capable of adapting to a variable
rate of drift of the target concept. Some of the results also describe an
active learning variant of this setting, and provide bounds on the number of
queries for the labels of points in the sequence sufficient to obtain the
stated bounds on the error rates
Improving adaptive bagging methods for evolving data streams
We propose two new improvements for bagging methods on evolving data streams. Recently, two new variants of Bagging were proposed: ADWIN Bagging and Adaptive-Size Hoeffding Tree (ASHT) Bagging. ASHT Bagging uses trees of different sizes, and ADWIN Bagging uses ADWIN as a change detector to decide when to discard underperforming ensemble members. We improve ADWIN Bagging using Hoeffding Adaptive Trees, trees that can adaptively learn from data streams that change over time. To speed up the time for adapting to change of Adaptive-Size Hoeffding Tree (ASHT) Bagging, we add an error change detector for each classifier. We test our improvements by performing an evaluation study on synthetic and real-world datasets comprising up to ten million examples
- …