Subsampling for efficient and effective unsupervised outlier detection ensembles

Abstract

Outlier detection and ensemble learning are well established research directions in data mining yet the application of en-semble techniques to outlier detection has been rarely stud-ied. Here, we propose and study subsampling as a technique to induce diversity among individual outlier detectors. We show analytically and experimentally that an outlier detec-tor based on a subsample per se, besides inducing diversity, can, under certain conditions, already improve upon the re-sults of the same outlier detector on the complete dataset. Building an ensemble on top of several subsamples is further improving the results. While in the literature so far the intu-ition that ensembles improve over single outlier detectors has just been transferred from the classification literature, here we also justify analytically why ensembles are also expected to work in the unsupervised area of outlier detection. As a side effect, running an ensemble of several outlier detectors on subsamples of the dataset is more efficient than ensembles based on other means of introducing diversity and, depend-ing on the sample rate and the size of the ensemble, can be even more efficient than just the single outlier detector on the complete data

Similar works

Full text

thumbnail-image

CiteSeerX

redirect
Last time updated on 29/10/2017

This paper was published in CiteSeerX.

Having an issue?

Is data on this page outdated, violates copyrights or anything else? Report the problem now and we will take corresponding actions after reviewing your request.