Search CORE

8,963 research outputs found

Efficient estimation of AUC in a sliding window

Author: A Bifet
C Ferri
D Brzezinski
DJ Hand
I Žliobaitė
J Gama
J Gama
J Gama
Remco R. Bouckaert
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/02/2019
Field of study

In many applications, monitoring area under the ROC curve (AUC) in a sliding window over a data stream is a natural way of detecting changes in the system. The drawback is that computing AUC in a sliding window is expensive, especially if the window size is large and the data flow is significant. In this paper we propose a scheme for maintaining an approximate AUC in a sliding window of length

k

. More specifically, we propose an algorithm that, given

\epsilon

, estimates AUC within

\epsilon / 2

, and can maintain this estimate in

O((\log k) / \epsilon)

time, per update, as the window slides. This provides a speed-up over the exact computation of AUC, which requires

O(k)

time, per update. The speed-up becomes more significant as the size of the window increases. Our estimate is based on grouping the data points together, and using these groups to calculate AUC. The grouping is designed carefully such that (

i

) the groups are small enough, so that the error stays small, (

ii

) the number of groups is small, so that enumerating them is not expensive, and (

iii

) the definition is flexible enough so that we can maintain the groups efficiently. Our experimental evaluation demonstrates that the average approximation error in practice is much smaller than the approximation guarantee

\epsilon / 2

, and that we can achieve significant speed-ups with only a modest sacrifice in accuracy

arXiv.org e-Print Archive

Crossref

Multivariate Correlation Discovery in Streaming Data

Author: d'Hondt Jens
Publication venue
Publication date: 22/09/2021
Field of study

Pure OAI Repository

KV-match: A Subsequence Matching Approach Supporting Normalization and Time Warping [Extended Version]

Author: Pan Ningting
Wang Chen
Wang Jianmin
Wang Peng
Wang Wei
Wu Jiaye
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 09/09/2018
Field of study

The volume of time series data has exploded due to the popularity of new applications, such as data center management and IoT. Subsequence matching is a fundamental task in mining time series data. All index-based approaches only consider raw subsequence matching (RSM) and do not support subsequence normalization. UCR Suite can deal with normalized subsequence match problem (NSM), but it needs to scan full time series. In this paper, we propose a novel problem, named constrained normalized subsequence matching problem (cNSM), which adds some constraints to NSM problem. The cNSM problem provides a knob to flexibly control the degree of offset shifting and amplitude scaling, which enables users to build the index to process the query. We propose a new index structure, KV-index, and the matching algorithm, KV-match. With a single index, our approach can support both RSM and cNSM problems under either ED or DTW distance. KV-index is a key-value structure, which can be easily implemented on local files or HBase tables. To support the query of arbitrary lengths, we extend KV-match to KV-match

_{DP}

, which utilizes multiple varied-length indexes to process the query. We conduct extensive experiments on synthetic and real-world datasets. The results verify the effectiveness and efficiency of our approach.Comment: 13 page

arXiv.org e-Print Archive

Crossref

Efficient Summing over Sliding Windows

Author: Basat Ran Ben
Einziger Gil
Friedman Roy
Kassner Yaron
Publication venue
Publication date: 01/01/2016
Field of study

This paper considers the problem of maintaining statistic aggregates over the last W elements of a data stream. First, the problem of counting the number of 1's in the last W bits of a binary stream is considered. A lower bound of {\Omega}(1/{\epsilon} + log W) memory bits for W{\epsilon}-additive approximations is derived. This is followed by an algorithm whose memory consumption is O(1/{\epsilon} + log W) bits, indicating that the algorithm is optimal and that the bound is tight. Next, the more general problem of maintaining a sum of the last W integers, each in the range of {0,1,...,R}, is addressed. The paper shows that approximating the sum within an additive error of RW{\epsilon} can also be done using {\Theta}(1/{\epsilon} + log W) bits for {\epsilon}={\Omega}(1/W). For {\epsilon}=o(1/W), we present a succinct algorithm which uses B(1 + o(1)) bits, where B={\Theta}(Wlog(1/W{\epsilon})) is the derived lower bound. We show that all lower bounds generalize to randomized algorithms as well. All algorithms process new elements and answer queries in O(1) worst-case time.Comment: A shorter version appears in SWAT 201

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server

Quasi-monotonic segmentation of state variable behavior for reactive control

Author: Brooks Martin
Fitzgerald Will
Lemire Daniel
Publication venue
Publication date: 01/01/2005
Field of study

Real-world agents must react to changing conditions as they execute planned tasks. Conditions are typically monitored through time series representing state variables. While some predicates on these times series only consider one measure at a time, other predicates, sometimes called episodic predicates, consider sets of measures. We consider a special class of episodic predicates based on segmentation of the the measures into quasi-monotonic intervals where each interval is either quasi-increasing, quasi-decreasing, or quasi-flat. While being scale-based, this approach is also computational efficient and results can be computed exactly without need for approximation algorithms. Our approach is compared to linear spline and regression analysis

CiteSeerX

NRC Publications Archive

CogPrints Cognitive Sciences Eprint Archive