Search CORE

3 research outputs found

Sensitivity-Based Optimization of Unsupervised Drift Detection for Categorical Data Streams

Author: Bender Janek
Ovtcharova Jivka
Trat Martin
Publication venue: Karlsruher Institut für Technologie
Publication date: 26/01/2023
Field of study

Real-world data streams are rarely characterized by stationary data distributions. Instead, the phenomenon commonly termed as concept drift, threatens the performance of estimators conducting inference on such data. Our contribution builds on the unsupervised concept drift detector CDCStream, which is specialized on processing categorical data directly. We propose a cooldown mechanism aiming at reducing its excessive sensitivity in order to curb false-alarm detections. Using practical classification and regression problems, we evaluate the impact of the mechanism on estimation performance and highlight the transferability of our mechanism on other detection methods. Additionally, we provide an intuitive means for tuning the sensitivity of drift detectors. While only marginally improving the unaltered form of the detector on publicly available benchmark data, our mechanism does so consistently in almost all configurations. In contrast, within the context of another real-world scenario, almost none of the tested drift-detection-based approaches could outperform a baseline approach. However, potentially false-alarm detections are reduced drastically in all scenarios. With this resulting in a cutback in signals for refitting estimators, while maintaining a better or at least comparable performance to vanilla CDCStream, compute infrastructure utilization could be economized further

KITopen

Multiple changepoint detection in categorical data streams

Author: Adams N
Plasse J
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 02/02/2019
Field of study

The need for efficient tools is pressing in the era of big data, particularly in streaming data applications. As data streams are ubiquitous, the ability to accurately detect multiple changepoints, without affecting the continuous flow of data, is an important issue. Change detection for categorical data streams is understudied, and existing work commonly introduces fixed control parameters while providing little insight into how they may be chosen. This is ill-suited to the streaming paradigm, motivating the need for an approach that introduces few parameters which may be set without requiring any prior knowledge of the stream. This paper introduces such a method, which can accurately detect changepoints in categorical data streams with fixed storage and computational requirements. The detector relies on the ability to adaptively monitor the category probabilities of a multinomial distribution, where temporal adaptivity is introduced using forgetting factors. A novel adaptive threshold is also developed which can be computed given a desired false positive rate. This method is then compared to sequential and nonsequential change detectors in a large simulation study which verifies the usefulness of our approach. A real data set consisting of nearly 40 million events from a computer network is also investigated

Spiral - Imperial College Digital Repository

Multiple changepoint detection in categorical data streams

Author: A Pettitt
A Tartakovsky
AG Ryan
B Eichinger
B Jackson
BK Ghosh
C Anagnostopoulos
CC Aggarwal
CD Hou
CH Weiß
D Siegmund
DA Bodenham
DA Wolfe
DC Montgomery
DE Comer
DM Hawkins
DS Matteson
ES Page
G Widmer
GJ Ross
GJ Ross
GJ Szekely
HL Chen
J Gama
JC Pinheiro
Joshua Plasse
L Parziale
M Höhle
MR Reynolds Jr
MR Reynolds Jr
NG Pavlidis
Niall M. Adams
P Eiauer
R Killick
R Killick
S Bersimis
S Kullback
S Mohurle
SS Haykin
V Amirzadeh
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref