Using Diversity Ensembles with Time Limits to Handle Concept Drift

Abstract

While traditional supervised learning focuses on static datasets, an increasing amount of data comes in the form of streams, where data is continuous and typically processed only once. A common problem with data streams is that the underlying concept we are trying to learn can be constantly evolving. This concept drift has been of interest to researchers the last few years and there is a need for improved machine learning algorithms that are capable of dealing with concept drifts. A promising approach involves using an ensemble of a diverse set of classifiers. The constituent classifiers are re-trained when a concept drift is detected. Decisions regarding the number of classifiers to maintain and the frequency of re-training classifiers are critical factors that determine classification accuracy in the presence of concept drift. This dissertation systematically investigated these issues in order to develop an improved classifier for online ensemble learning. The impact of reducing the time requiring additional ensembles was studied using artificial and real world datasets. Findings from these studies revealed that in many cases the number of time steps additional ensembles are in memory can be reduced without sacrificing prequential accuracy. It was also found that this new ensemble approach performed well in the presence of false concept drift

    Similar works