Search CORE

49 research outputs found

Change detection in streaming data analytics: a comparison of Bayesian online and martingale approaches

Author: Emmanouilidis Christos
Namoano Bernadin
Ruiz-Carcel Cristobal
Starr Andrew G.
Publication venue: 'Elsevier BV'
Publication date: 18/12/2020
Field of study

On line change detection is a key activity in streaming analytics, which aims to determine whether the current observation in a time series marks a change point in some important characteristic of the data, given the sequence of data observed so far. It can be a challenging task when monitoring complex systems, which are generating streaming data of significant volume and velocity. While applicable to diverse problem domains, it is highly relevant to monitoring high value and critical engineering assets. This paper presents an empirical evaluation of two algorithmic approaches for streaming data change detection. These are a modified martingale and a Bayesian online detection algorithm. Results obtained with both synthetic and real world data sets are presented and relevant advantages and limitations are discussed

Testing randomness online

Author: Vovk Vladimir
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 29/03/2020
Field of study

The hypothesis of randomness is fundamental in statistical machine learning and in many areas of nonparametric statistics; it says that the observations are assumed to be independent and coming from the same unknown probability distribution. This hypothesis is close, in certain respects, to the hypothesis of exchangeability, which postulates that the distribution of the observations is invariant with respect to their permutations. This paper reviews known methods of testing the two hypotheses concentrating on the online mode of testing, when the observations arrive sequentially. All known online methods for testing these hypotheses are based on conformal martingales, which are defined and studied in detail. The paper emphasizes conceptual and practical aspects and states two kinds of results. Validity results limit the probability of a false alarm or the frequency of false alarms for various procedures based on conformal martingales, including conformal versions of the CUSUM and Shiryaev-Roberts procedures. Efficiency results establish connections between randomness, exchangeability, and conformal martingales.Comment: 34 pages, 1 table, 4 figure

arXiv.org e-Print Archive

Inductive Conformal Martingales for Change-Point Detection

Author: Burnaev Evgeny
Gammerman Alexander
Nouretdinov Ilia
Volkhonskiy Denis
Vovk Vladimir
Publication venue
Publication date: 01/01/2017
Field of study

We consider the problem of quickest change-point detection in data streams. Classical change-point detection procedures, such as CUSUM, Shiryaev-Roberts and Posterior Probability statistics, are optimal only if the change-point model is known, which is an unrealistic assumption in typical applied problems. Instead we propose a new method for change-point detection based on Inductive Conformal Martingales, which requires only the independence and identical distribution of observations. We compare the proposed approach to standard methods, as well as to change-point detection oracles, which model a typical practical situation when we have only imprecise (albeit parametric) information about pre- and post-change data distributions. Results of comparison provide evidence that change-point detection based on Inductive Conformal Martingales is an efficient tool, capable to work under quite general conditions unlike traditional approaches.Comment: 22 pages, 9 figures, 5 table

arXiv.org e-Print Archive

Online Distribution Shift Detection via Recency Prediction

Author: Hindy Ali
Luo Rachel
Pavone Marco
Savarese Silvio
Schmerling Edward
Sinha Rohan
Sun Yixiao
Zhao Shengjia
Publication venue
Publication date: 28/09/2023
Field of study

When deploying modern machine learning-enabled robotic systems in high-stakes applications, detecting distribution shift is critical. However, most existing methods for detecting distribution shift are not well-suited to robotics settings, where data often arrives in a streaming fashion and may be very high-dimensional. In this work, we present an online method for detecting distribution shift with guarantees on the false positive rate - i.e., when there is no distribution shift, our system is very unlikely (with probability

< \epsilon

) to falsely issue an alert; any alerts that are issued should therefore be heeded. Our method is specifically designed for efficient detection even with high dimensional data, and it empirically achieves up to 11x faster detection on realistic robotics settings compared to prior work while maintaining a low false negative rate in practice (whenever there is a distribution shift in our experiments, our method indeed emits an alert). We demonstrate our approach in both simulation and hardware for a visual servoing task, and show that our method indeed issues an alert before a failure occurs

arXiv.org e-Print Archive

A Novel Martingale Based Model Using a Smartphone to Detect Gait Bout in Human Activity Recognition

Author: Etumusei Jonathan
Martinez Carracedo Jorge
McClean Sally I
Publication venue: 'Hindawi Limited'
Publication date: 30/04/2022
Field of study

Ulster University's Research Portal

Transcend:Detecting Concept Drift in Malware Classification Models

Author: Cavallaro Lorenzo
Dash Santanu
Jordaney Roberto
Nouretdinov Ilia
Papini Davide
Sharad Kumar
Wang Zhi
Publication venue: USENIX
Publication date: 01/01/2017
Field of study

Building machine learning models of malware behavior is widely accepted as a panacea towards effective malware classification. A crucial requirement for building sustainable learning models, though, is to train on a wide variety of malware samples. Unfortunately, malware evolves rapidly and it thus becomes hard—if not impossible—to generalize learning models to reflect future, previously-unseen behaviors. Consequently, most malware classifiers become unsustainable in the long run, becoming rapidly antiquated as malware continues to evolve. In this work, we propose Transcend, a framework to identify aging classification models in vivo during deployment, much before the machine learning model’s performance starts to degrade. This is a significant departure from conventional approaches that retrain aging models retrospectively when poor performance is observed. Our approach uses a statistical comparison of samples seen during deployment with those used to train the model, thereby building metrics for prediction quality. We show how Transcend can be used to identify concept drift based on two separate case studies on Android andWindows malware, raising a red flag before the model starts making consistently poor decisions due to out-of-date training

King's Research Portal

Incorporating change detection in the monitoring phase of adaptive query processing

Author: A Deshpande
B Babcock
B Babcock
B Gedik
C Alippi
CC Aggarwal
D Abadi
D Kifer
D Singh
F Chu
F Desobry
G Cormode
J Takeuchi
JC Lagarias
JM Hellerstein
KT Chuang
M Armbrust
M Basseville
M Elseidy
M Roesch
R Avnur
R Gnanadesikan
S Babu
S Chaudhuri
SS Ho
T Dasu
V Braverman
V Markl
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study