59,982 research outputs found
Change Detection in Streaming Data
Change detection is the process of identifying differences in the state of an object or
phenomenon by observing it at different times or different locations in space. In the
streaming context, it is the process of segmenting a data stream into different segments
by identifying the points where the stream dynamics changes. Decentralized
change detection can be used in many interesting, and important applications such
environmental observing systems, medicare monitoring systems. Although there is
great deal of work on distributed detection and data fusion, most of work focuses
on the one-time change detection solutions. One-time change detection method requires
to proceed data once in response to the change occurring. The trade-off of
a continuous distributed detection of changes include detection accuracy, spaceefficiency,
detection delay, and communication-efficiency.
To achieve these goals, the wildfire warning system is used as a motivating scenario.
From the challenges and requirements of the wildfire warning system, the
change detection algorithms for streaming data are proposed a part of the solution
to the wildfire warning system. By selecting various models of local change detection,
different schemes for distributed change detections, and the data exchange
protocols, different designs can be achieved.
Based on this approach, the contributions of this dissertation are as follows.
A general two-window framework for detecting changes in a single data stream is
presented. A general synopsis-based change detection framework is proposed. Theoretical
and empirical analysis shows that the detection performance of synopsisbased
detector is similar to that of non-synopsis change detector if a distance function
quantifying the changes is preserved under the process of constructing synopsis.
A clustering-based change detection and clustering maintenance method over
sliding window is presented. Clustering-based detector can automatically detect the
changes in the multivariate streaming data. A framework for decentralized change
detection in wireless sensor networks is proposed. A distributed framework for
clustering streaming data is proposed by extending the two-phased stream clustering
approach which is widely used to cluster a single data stream.Unter Änderungserkennung wird der Prozess der Erkennung von Unterschieden im
Zustand eines Objekts oder Phänomens verstanden, wenn dieses zu verschiedenen
Zeitpunkten oder an verschiedenen Orten beobachtet wird. Im Kontext der Datenstromverarbeitung
stellt dieser Prozess die Segmentierung eines Datenstroms anhand
der identifizierten Punkte, an denen sich die Stromdynamiken ändern, dar.
Die Fähigkeit, Änderungen in den Stromdaten zu erkennen, darauf zu reagieren
und sich daran anzupassen, spielt in vielen Anwendungsbereichen, wie z.B.
dem Aktivitätsüberwachung, dem Datenstrom-Mining und Maschinenlernen sowie
dem Datenmanagement hinsichtlich Datenmenge und Datenqualität, eine wichtige
Rolle. Dezentralisierte Änderungserkennung kann in vielen interessanten und
wichtigen Anwendungsbereichen, wie z.B. in Umgebungsüberwachungssystemen
oder medizinischen Überwachungssystemen, eingesetzt werden. Obgleich es eine
Vielzahl von Arbeiten im Bereich der verteilten Änderungserkennung und Datenfusion
gibt, liegt der Fokus dieser Arbeiten meist lediglich auf der Erkennung von
einmaligen Änderungen. Die einmalige Änderungserkennungsmethode erfordert
die einmalige Verarbeitung der Daten als Antwort auf die auftretende Änderung.
Der Kompromiss einer kontinuierlichen, verteilten Erkennung von Änderungen
umfasst die Erkennungsgenauigkeit, die Speichereffizienz sowie die Berechnungseffizienz.
Um dieses Ziel zu erreichen, wird das Flächenbrandwarnsystem
als motivierendes Szenario genutzt. Basierend auf den Herausforderungen und Anforderungen
dieses Warnsystems wird ein Algorithmus zur Erkennung von Änderungen
in Stromdaten als Teil einer Gesamtlösung für das Flächenbrandwarnsystem
vorgestellt. Durch die Auswahl verschiedener Modelle zur lokalen und verteilten
Änderungserkennung sowie verschiedener Datenaustauschprotokolle können
verschiedene Systemdesigns entwickelt werden. Basierend auf diesem Ansatz leistet
diese Dissertation nachfolgend aufgeführte Beiträge. Es wird ein allgemeines
2-Fenster Framework zur Erkennung von Änderungen in einem einzelnen Datenstrom
vorgestellt. Weiterhin wird ein allgemeines synopsenbasiertes Framework
zur Änderungserkennung beschrieben. Mittels theoretischer und empirischer Analysen
wird gezeigt, dass die Erkennungs-Performance des synopsenbasierten Änderungsdetektors
ähnlich der eines nicht-synopsenbasierten ist, solange eine Distanzfunktion,
welche die Änderungen quantifiziert, während der Erstellung der
Synopse eingehalten wird. Es wird Cluster-basierte Änderungserkennung und
Cluster-Pflege über gleitenden Fenstern vorgestellt.Weiterhin wird ein Framework
zur verteilten Änderungserkennung in drahtlosen Sensornetzwerken beschrieben.
Basierend auf dem 2-Phasen Stromdaten-Cluster-Ansatz, welcher weitestgehend
zur Clusterung eines einzelnen Datenstroms eingesetzt wird, wird ein verteiltes
Framework zur Clusterung von Stromdaten vorgestellt
Adaptive estimation and change detection of correlation and quantiles for evolving data streams
Streaming data processing is increasingly playing a central role in enterprise data architectures due to an abundance of available measurement data from a wide variety of sources and advances in data capture and infrastructure technology. Data streams arrive, with high frequency, as never-ending sequences of events, where the underlying data generating process always has the potential to evolve. Business operations often demand real-time processing of data streams for keeping models up-to-date and timely decision-making. For example in cybersecurity contexts, analysing streams of network data can aid the detection of potentially malicious behaviour. Many tools for statistical inference cannot meet the challenging demands of streaming data, where the computational cost of updates to models must be constant to ensure continuous processing as data scales. Moreover, these tools are often not capable of adapting to changes, or drift, in the data. Thus, new tools for modelling data streams with efficient data processing and model updating capabilities, referred to as streaming analytics, are required. Regular intervention for control parameter configuration is prohibitive to the truly continuous processing constraints of streaming data. There is a notable absence of such tools designed with both temporal-adaptivity to accommodate drift and the autonomy to not rely on control parameter tuning. Streaming analytics with these properties can be developed using an Adaptive Forgetting (AF) framework, with roots in adaptive filtering. The fundamental contributions of this thesis are to extend the streaming toolkit by using the AF framework to develop autonomous and temporally-adaptive streaming analytics.
The first contribution uses the AF framework to demonstrate the development of a model, and validation procedure, for estimating time-varying parameters of bivariate data streams from cyber-physical systems. This is accompanied by a novel continuous monitoring change detection system that compares adaptive and non-adaptive estimates. The second contribution is the development of a streaming analytic for the correlation coefficient and an associated change detector to monitor changes to correlation structures across streams. This is demonstrated on cybersecurity network data. The third contribution is a procedure for estimating time-varying binomial data with thorough exploration of the nuanced behaviour of this estimator. The final contribution is a framework to enhance extant streaming quantile estimators with autonomous, temporally-adaptive properties. In addition, a novel streaming quantile procedure is developed and demonstrated, in an extensive simulation study, to show appealing performance.Open Acces
Change detection in streaming data analytics: a comparison of Bayesian online and martingale approaches
On line change detection is a key activity in streaming analytics, which aims to determine whether the current observation in a time series marks a change point in some important characteristic of the data, given the sequence of data observed so far. It can be a challenging task when monitoring complex systems, which are generating streaming data of significant volume and velocity. While applicable to diverse problem domains, it is highly relevant to monitoring high value and critical engineering assets. This paper presents an empirical evaluation of two algorithmic approaches for streaming data change detection. These are a modified martingale and a Bayesian online detection algorithm. Results obtained with both synthetic and real world data sets are presented and relevant advantages and limitations are discussed
ECHAD: Embedding-Based Change Detection from Multivariate Time Series in Smart Grids
Smart grids are power grids where clients may actively participate in energy production, storage and distribution. Smart grid management raises several challenges, including the possible changes and evolutions in terms of energy consumption and production, that must be taken into account in order to properly regulate the energy distribution. In this context, machine learning methods can be fruitfully adopted to support the analysis and to predict the behavior of smart grids, by exploiting the large amount of streaming data generated by sensor networks. In this article, we propose a novel change detection method, called ECHAD (Embedding-based CHAnge Detection), that leverages embedding techniques, one-class learning, and a dynamic detection approach that incrementally updates the learned model to reflect the new data distribution. Our experiments show that ECHAD achieves optimal performances on synthetic data representing challenging scenarios. Moreover, a qualitative analysis of the results obtained on real data of a real power grid reveals the quality of the change detection of ECHAD. Specifically, a comparison with state-of-the-art approaches shows the ability of ECHAD in identifying additional relevant changes, not detected by competitors, avoiding false positive detections
RePAD: Real-time Proactive Anomaly Detection for Time Series
During the past decade, many anomaly detection approaches have been
introduced in different fields such as network monitoring, fraud detection, and
intrusion detection. However, they require understanding of data pattern and
often need a long off-line period to build a model or network for the target
data. Providing real-time and proactive anomaly detection for streaming time
series without human intervention and domain knowledge is highly valuable since
it greatly reduces human effort and enables appropriate countermeasures to be
undertaken before a disastrous damage, failure, or other harmful event occurs.
However, this issue has not been well studied yet. To address it, this paper
proposes RePAD, which is a Real-time Proactive Anomaly Detection algorithm for
streaming time series based on Long Short-Term Memory (LSTM). RePAD utilizes
short-term historic data points to predict and determine whether or not the
upcoming data point is a sign that an anomaly is likely to happen in the near
future. By dynamically adjusting the detection threshold over time, RePAD is
able to tolerate minor pattern change in time series and detect anomalies
either proactively or on time. Experiments based on two time series datasets
collected from the Numenta Anomaly Benchmark demonstrate that RePAD is able to
proactively detect anomalies and provide early warnings in real time without
human intervention and domain knowledge.Comment: 12 pages, 8 figures, the 34th International Conference on Advanced
Information Networking and Applications (AINA 2020
- …