59,982 research outputs found

    Change Detection in Streaming Data

    Get PDF
    Change detection is the process of identifying differences in the state of an object or phenomenon by observing it at different times or different locations in space. In the streaming context, it is the process of segmenting a data stream into different segments by identifying the points where the stream dynamics changes. Decentralized change detection can be used in many interesting, and important applications such environmental observing systems, medicare monitoring systems. Although there is great deal of work on distributed detection and data fusion, most of work focuses on the one-time change detection solutions. One-time change detection method requires to proceed data once in response to the change occurring. The trade-off of a continuous distributed detection of changes include detection accuracy, spaceefficiency, detection delay, and communication-efficiency. To achieve these goals, the wildfire warning system is used as a motivating scenario. From the challenges and requirements of the wildfire warning system, the change detection algorithms for streaming data are proposed a part of the solution to the wildfire warning system. By selecting various models of local change detection, different schemes for distributed change detections, and the data exchange protocols, different designs can be achieved. Based on this approach, the contributions of this dissertation are as follows. A general two-window framework for detecting changes in a single data stream is presented. A general synopsis-based change detection framework is proposed. Theoretical and empirical analysis shows that the detection performance of synopsisbased detector is similar to that of non-synopsis change detector if a distance function quantifying the changes is preserved under the process of constructing synopsis. A clustering-based change detection and clustering maintenance method over sliding window is presented. Clustering-based detector can automatically detect the changes in the multivariate streaming data. A framework for decentralized change detection in wireless sensor networks is proposed. A distributed framework for clustering streaming data is proposed by extending the two-phased stream clustering approach which is widely used to cluster a single data stream.Unter Änderungserkennung wird der Prozess der Erkennung von Unterschieden im Zustand eines Objekts oder Phänomens verstanden, wenn dieses zu verschiedenen Zeitpunkten oder an verschiedenen Orten beobachtet wird. Im Kontext der Datenstromverarbeitung stellt dieser Prozess die Segmentierung eines Datenstroms anhand der identifizierten Punkte, an denen sich die Stromdynamiken ändern, dar. Die Fähigkeit, Änderungen in den Stromdaten zu erkennen, darauf zu reagieren und sich daran anzupassen, spielt in vielen Anwendungsbereichen, wie z.B. dem Aktivitätsüberwachung, dem Datenstrom-Mining und Maschinenlernen sowie dem Datenmanagement hinsichtlich Datenmenge und Datenqualität, eine wichtige Rolle. Dezentralisierte Änderungserkennung kann in vielen interessanten und wichtigen Anwendungsbereichen, wie z.B. in Umgebungsüberwachungssystemen oder medizinischen Überwachungssystemen, eingesetzt werden. Obgleich es eine Vielzahl von Arbeiten im Bereich der verteilten Änderungserkennung und Datenfusion gibt, liegt der Fokus dieser Arbeiten meist lediglich auf der Erkennung von einmaligen Änderungen. Die einmalige Änderungserkennungsmethode erfordert die einmalige Verarbeitung der Daten als Antwort auf die auftretende Änderung. Der Kompromiss einer kontinuierlichen, verteilten Erkennung von Änderungen umfasst die Erkennungsgenauigkeit, die Speichereffizienz sowie die Berechnungseffizienz. Um dieses Ziel zu erreichen, wird das Flächenbrandwarnsystem als motivierendes Szenario genutzt. Basierend auf den Herausforderungen und Anforderungen dieses Warnsystems wird ein Algorithmus zur Erkennung von Änderungen in Stromdaten als Teil einer Gesamtlösung für das Flächenbrandwarnsystem vorgestellt. Durch die Auswahl verschiedener Modelle zur lokalen und verteilten Änderungserkennung sowie verschiedener Datenaustauschprotokolle können verschiedene Systemdesigns entwickelt werden. Basierend auf diesem Ansatz leistet diese Dissertation nachfolgend aufgeführte Beiträge. Es wird ein allgemeines 2-Fenster Framework zur Erkennung von Änderungen in einem einzelnen Datenstrom vorgestellt. Weiterhin wird ein allgemeines synopsenbasiertes Framework zur Änderungserkennung beschrieben. Mittels theoretischer und empirischer Analysen wird gezeigt, dass die Erkennungs-Performance des synopsenbasierten Änderungsdetektors ähnlich der eines nicht-synopsenbasierten ist, solange eine Distanzfunktion, welche die Änderungen quantifiziert, während der Erstellung der Synopse eingehalten wird. Es wird Cluster-basierte Änderungserkennung und Cluster-Pflege über gleitenden Fenstern vorgestellt.Weiterhin wird ein Framework zur verteilten Änderungserkennung in drahtlosen Sensornetzwerken beschrieben. Basierend auf dem 2-Phasen Stromdaten-Cluster-Ansatz, welcher weitestgehend zur Clusterung eines einzelnen Datenstroms eingesetzt wird, wird ein verteiltes Framework zur Clusterung von Stromdaten vorgestellt

    Adaptive estimation and change detection of correlation and quantiles for evolving data streams

    Get PDF
    Streaming data processing is increasingly playing a central role in enterprise data architectures due to an abundance of available measurement data from a wide variety of sources and advances in data capture and infrastructure technology. Data streams arrive, with high frequency, as never-ending sequences of events, where the underlying data generating process always has the potential to evolve. Business operations often demand real-time processing of data streams for keeping models up-to-date and timely decision-making. For example in cybersecurity contexts, analysing streams of network data can aid the detection of potentially malicious behaviour. Many tools for statistical inference cannot meet the challenging demands of streaming data, where the computational cost of updates to models must be constant to ensure continuous processing as data scales. Moreover, these tools are often not capable of adapting to changes, or drift, in the data. Thus, new tools for modelling data streams with efficient data processing and model updating capabilities, referred to as streaming analytics, are required. Regular intervention for control parameter configuration is prohibitive to the truly continuous processing constraints of streaming data. There is a notable absence of such tools designed with both temporal-adaptivity to accommodate drift and the autonomy to not rely on control parameter tuning. Streaming analytics with these properties can be developed using an Adaptive Forgetting (AF) framework, with roots in adaptive filtering. The fundamental contributions of this thesis are to extend the streaming toolkit by using the AF framework to develop autonomous and temporally-adaptive streaming analytics. The first contribution uses the AF framework to demonstrate the development of a model, and validation procedure, for estimating time-varying parameters of bivariate data streams from cyber-physical systems. This is accompanied by a novel continuous monitoring change detection system that compares adaptive and non-adaptive estimates. The second contribution is the development of a streaming analytic for the correlation coefficient and an associated change detector to monitor changes to correlation structures across streams. This is demonstrated on cybersecurity network data. The third contribution is a procedure for estimating time-varying binomial data with thorough exploration of the nuanced behaviour of this estimator. The final contribution is a framework to enhance extant streaming quantile estimators with autonomous, temporally-adaptive properties. In addition, a novel streaming quantile procedure is developed and demonstrated, in an extensive simulation study, to show appealing performance.Open Acces

    Change detection in streaming data analytics: a comparison of Bayesian online and martingale approaches

    Get PDF
    On line change detection is a key activity in streaming analytics, which aims to determine whether the current observation in a time series marks a change point in some important characteristic of the data, given the sequence of data observed so far. It can be a challenging task when monitoring complex systems, which are generating streaming data of significant volume and velocity. While applicable to diverse problem domains, it is highly relevant to monitoring high value and critical engineering assets. This paper presents an empirical evaluation of two algorithmic approaches for streaming data change detection. These are a modified martingale and a Bayesian online detection algorithm. Results obtained with both synthetic and real world data sets are presented and relevant advantages and limitations are discussed

    ECHAD: Embedding-Based Change Detection from Multivariate Time Series in Smart Grids

    Get PDF
    Smart grids are power grids where clients may actively participate in energy production, storage and distribution. Smart grid management raises several challenges, including the possible changes and evolutions in terms of energy consumption and production, that must be taken into account in order to properly regulate the energy distribution. In this context, machine learning methods can be fruitfully adopted to support the analysis and to predict the behavior of smart grids, by exploiting the large amount of streaming data generated by sensor networks. In this article, we propose a novel change detection method, called ECHAD (Embedding-based CHAnge Detection), that leverages embedding techniques, one-class learning, and a dynamic detection approach that incrementally updates the learned model to reflect the new data distribution. Our experiments show that ECHAD achieves optimal performances on synthetic data representing challenging scenarios. Moreover, a qualitative analysis of the results obtained on real data of a real power grid reveals the quality of the change detection of ECHAD. Specifically, a comparison with state-of-the-art approaches shows the ability of ECHAD in identifying additional relevant changes, not detected by competitors, avoiding false positive detections

    RePAD: Real-time Proactive Anomaly Detection for Time Series

    Full text link
    During the past decade, many anomaly detection approaches have been introduced in different fields such as network monitoring, fraud detection, and intrusion detection. However, they require understanding of data pattern and often need a long off-line period to build a model or network for the target data. Providing real-time and proactive anomaly detection for streaming time series without human intervention and domain knowledge is highly valuable since it greatly reduces human effort and enables appropriate countermeasures to be undertaken before a disastrous damage, failure, or other harmful event occurs. However, this issue has not been well studied yet. To address it, this paper proposes RePAD, which is a Real-time Proactive Anomaly Detection algorithm for streaming time series based on Long Short-Term Memory (LSTM). RePAD utilizes short-term historic data points to predict and determine whether or not the upcoming data point is a sign that an anomaly is likely to happen in the near future. By dynamically adjusting the detection threshold over time, RePAD is able to tolerate minor pattern change in time series and detect anomalies either proactively or on time. Experiments based on two time series datasets collected from the Numenta Anomaly Benchmark demonstrate that RePAD is able to proactively detect anomalies and provide early warnings in real time without human intervention and domain knowledge.Comment: 12 pages, 8 figures, the 34th International Conference on Advanced Information Networking and Applications (AINA 2020
    corecore