9 research outputs found
Using Labeled Data to Evaluate Change Detectors in a Multivariate Streaming Environment
We consider the problem of detecting changes in a multivariate data stream. A change detector is defined by a detection algorithm and an alarm threshold. A detection algorithm maps the stream of input vectors into a univariate detection stream. The detector signals a change when the detection stream exceeds the chosen alarm threshold. We consider two aspects of the problem: (1) setting the alarm threshold and (2) measuring/comparing the performance of detection algorithms. We assume we are given a segment of the stream where changes of interest are marked. We present evidence that, without such marked training data, it might not be possible to accurately estimate the false alarm rate for a given alarm threshold. Commonly used approaches assume the data stream consists of independent observations, an implausible assumption given the time series nature of the data. Lack of independence can lead to estimates that are badly biased. Marked training data can also be used for realistic comparison of detection algorithms. We define a version of the receiver operating characteristic curve adapted to the change detection problem and propose a block bootstrap for comparing such curves. We illustrate the proposed methodology using multivariate data derived from an image stream
Long signal change-point detection
The detection of change-points in a spatially or time ordered data sequence
is an important problem in many fields such as genetics and finance. We derive
the asymptotic distribution of a statistic recently suggested for detecting
change-points. Simulation of its estimated limit distribution leads to a new
and computationally efficient change-point detection algorithm, which can be
used on very long signals. We assess the algorithm via simulations and on
previously benchmarked real-world data sets
Consistent change-point detection with kernels
International audienceIn this paper we study the kernel change-point algorithm (KCP) proposed by Arlot, Celisse and Harchaoui (2012), which aims at locating an unknown number of change-points in the distribution of a sequence of independent data taking values in an arbitrary set. The change-points are selected by model selection with a penalized kernel empirical criterion. We provide a non-asymptotic result showing that, with high probability, the KCP procedure retrieves the correct number of change-points, provided that the constant in the penalty is well-chosen; in addition, KCP estimates the change-points location at the optimal rate. As a consequence, when using a characteristic kernel, KCP detects all kinds of change in the distribution (not only changes in the mean or the variance), and it is able to do so for complex structured data (not necessarily in ). Most of the analysis is conducted assuming that the kernel is bounded; part of the results can be extended when we only assume a finite second-order moment
Using Labeled Data to Evaluate Change Detectors in a Multivariate Streaming Environment
We consider the problem of detecting changes in a multivariate data stream. A change detector is defined by a detection algorithm and an alarm threshold. A detection algorithm maps the stream of input vectors into a univariate detection stream. The detector signals a change when the detection stream exceeds the chosen alarm threshold. We consider two aspects of the problem: (1) setting the alarm threshold and (2) measuring/comparing the performance of detection algorithms. We assume we are given a segment of the stream where changes of interest are marked. We present evidence that, without such marked training data, it might not be possible to accurately estimate the false alarm rate for a given alarm threshold. Commonly used approaches assume the data stream consists of independent observations, an implausible assumption given the time series nature of the data. Lack of independence can lead to estimates that are badly biased. Marked training data can also be used for realistic comparison of detection algorithms. We define a version of the receiver operating characteristic curve adapted to the change detection problem and propose a block bootstrap for comparing such curves. We illustrate the proposed methodology using multivariate data derived from an image stream. Key words: Block bootstrap; Change point detection; Time series analysis
Change Detection in Streaming Data
Change detection is the process of identifying differences in the state of an object or
phenomenon by observing it at different times or different locations in space. In the
streaming context, it is the process of segmenting a data stream into different segments
by identifying the points where the stream dynamics changes. Decentralized
change detection can be used in many interesting, and important applications such
environmental observing systems, medicare monitoring systems. Although there is
great deal of work on distributed detection and data fusion, most of work focuses
on the one-time change detection solutions. One-time change detection method requires
to proceed data once in response to the change occurring. The trade-off of
a continuous distributed detection of changes include detection accuracy, spaceefficiency,
detection delay, and communication-efficiency.
To achieve these goals, the wildfire warning system is used as a motivating scenario.
From the challenges and requirements of the wildfire warning system, the
change detection algorithms for streaming data are proposed a part of the solution
to the wildfire warning system. By selecting various models of local change detection,
different schemes for distributed change detections, and the data exchange
protocols, different designs can be achieved.
Based on this approach, the contributions of this dissertation are as follows.
A general two-window framework for detecting changes in a single data stream is
presented. A general synopsis-based change detection framework is proposed. Theoretical
and empirical analysis shows that the detection performance of synopsisbased
detector is similar to that of non-synopsis change detector if a distance function
quantifying the changes is preserved under the process of constructing synopsis.
A clustering-based change detection and clustering maintenance method over
sliding window is presented. Clustering-based detector can automatically detect the
changes in the multivariate streaming data. A framework for decentralized change
detection in wireless sensor networks is proposed. A distributed framework for
clustering streaming data is proposed by extending the two-phased stream clustering
approach which is widely used to cluster a single data stream.Unter Änderungserkennung wird der Prozess der Erkennung von Unterschieden im
Zustand eines Objekts oder Phänomens verstanden, wenn dieses zu verschiedenen
Zeitpunkten oder an verschiedenen Orten beobachtet wird. Im Kontext der Datenstromverarbeitung
stellt dieser Prozess die Segmentierung eines Datenstroms anhand
der identifizierten Punkte, an denen sich die Stromdynamiken ändern, dar.
Die Fähigkeit, Änderungen in den Stromdaten zu erkennen, darauf zu reagieren
und sich daran anzupassen, spielt in vielen Anwendungsbereichen, wie z.B.
dem Aktivitätsüberwachung, dem Datenstrom-Mining und Maschinenlernen sowie
dem Datenmanagement hinsichtlich Datenmenge und Datenqualität, eine wichtige
Rolle. Dezentralisierte Änderungserkennung kann in vielen interessanten und
wichtigen Anwendungsbereichen, wie z.B. in Umgebungsüberwachungssystemen
oder medizinischen Überwachungssystemen, eingesetzt werden. Obgleich es eine
Vielzahl von Arbeiten im Bereich der verteilten Änderungserkennung und Datenfusion
gibt, liegt der Fokus dieser Arbeiten meist lediglich auf der Erkennung von
einmaligen Änderungen. Die einmalige Änderungserkennungsmethode erfordert
die einmalige Verarbeitung der Daten als Antwort auf die auftretende Änderung.
Der Kompromiss einer kontinuierlichen, verteilten Erkennung von Änderungen
umfasst die Erkennungsgenauigkeit, die Speichereffizienz sowie die Berechnungseffizienz.
Um dieses Ziel zu erreichen, wird das Flächenbrandwarnsystem
als motivierendes Szenario genutzt. Basierend auf den Herausforderungen und Anforderungen
dieses Warnsystems wird ein Algorithmus zur Erkennung von Änderungen
in Stromdaten als Teil einer Gesamtlösung für das Flächenbrandwarnsystem
vorgestellt. Durch die Auswahl verschiedener Modelle zur lokalen und verteilten
Änderungserkennung sowie verschiedener Datenaustauschprotokolle können
verschiedene Systemdesigns entwickelt werden. Basierend auf diesem Ansatz leistet
diese Dissertation nachfolgend aufgeführte Beiträge. Es wird ein allgemeines
2-Fenster Framework zur Erkennung von Änderungen in einem einzelnen Datenstrom
vorgestellt. Weiterhin wird ein allgemeines synopsenbasiertes Framework
zur Änderungserkennung beschrieben. Mittels theoretischer und empirischer Analysen
wird gezeigt, dass die Erkennungs-Performance des synopsenbasierten Änderungsdetektors
ähnlich der eines nicht-synopsenbasierten ist, solange eine Distanzfunktion,
welche die Änderungen quantifiziert, während der Erstellung der
Synopse eingehalten wird. Es wird Cluster-basierte Änderungserkennung und
Cluster-Pflege über gleitenden Fenstern vorgestellt.Weiterhin wird ein Framework
zur verteilten Änderungserkennung in drahtlosen Sensornetzwerken beschrieben.
Basierend auf dem 2-Phasen Stromdaten-Cluster-Ansatz, welcher weitestgehend
zur Clusterung eines einzelnen Datenstroms eingesetzt wird, wird ein verteiltes
Framework zur Clusterung von Stromdaten vorgestellt