Search CORE

3 research outputs found

Efficient estimation of AUC in a sliding window

Author: A Bifet
C Ferri
D Brzezinski
DJ Hand
I Žliobaitė
J Gama
J Gama
J Gama
Remco R. Bouckaert
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/02/2019
Field of study

In many applications, monitoring area under the ROC curve (AUC) in a sliding window over a data stream is a natural way of detecting changes in the system. The drawback is that computing AUC in a sliding window is expensive, especially if the window size is large and the data flow is significant. In this paper we propose a scheme for maintaining an approximate AUC in a sliding window of length

k

. More specifically, we propose an algorithm that, given

\epsilon

, estimates AUC within

\epsilon / 2

, and can maintain this estimate in

O((\log k) / \epsilon)

time, per update, as the window slides. This provides a speed-up over the exact computation of AUC, which requires

O(k)

time, per update. The speed-up becomes more significant as the size of the window increases. Our estimate is based on grouping the data points together, and using these groups to calculate AUC. The grouping is designed carefully such that (

i

) the groups are small enough, so that the error stays small, (

ii

) the number of groups is small, so that enumerating them is not expensive, and (

iii

) the definition is flexible enough so that we can maintain the groups efficiently. Our experimental evaluation demonstrates that the average approximation error in practice is much smaller than the approximation guarantee

\epsilon / 2

, and that we can achieve significant speed-ups with only a modest sacrifice in accuracy

arXiv.org e-Print Archive

Crossref

Maintaining AUC and H-measure over time

Author: Tatti Nikolaj
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/08/2022
Field of study

Measuring the performance of a classifier is a vital task in machine learning. The running time of an algorithm that computes the measure plays a very small role in an offline setting, for example, when the classifier is being developed by a researcher. However, the running time becomes more crucial if our goal is to monitor the performance of a classifier over time. In this paper we study three algorithms for maintaining two measures. The first algorithm maintains area under the ROC curve (AUC) under addition and deletion of data points in O(log n) time. This is done by maintaining the data points sorted in a self-balanced search tree. In addition, we augment the search tree that allows us to query the ROC coordinates of a data point in O(log n) time. In doing so we are able to maintain AUC in O(log n) time. Our next two algorithms involve in maintaining H-measure, an alternative measure based on the ROC curve. Computing the measure is a two-step process: first we need to compute a convex hull of the ROC curve, followed by a sum over the convex hull. We demonstrate that we can maintain the convex hull using a minor modification of the classic convex hull maintenance algorithm. We then show that under certain conditions, we can compute the H-measure exactly in O (log(2) n) time, and if the conditions are not met, then we can estimate the H-measure in O((log n + epsilon(-1)) log n) time. We show empirically that our methods are significantly faster than the baselines.Peer reviewe

Helsingin yliopiston digitaalinen arkisto

Maintaining AUC and H-measure over time

Author: Tatti Nikolaj
Publication venue
Publication date: 12/12/2021
Field of study

arXiv.org e-Print Archive

Helsingin yliopiston digitaalinen arkisto