326 research outputs found

    Implications of Z-normalization in the matrix profile

    Get PDF
    Companies are increasingly measuring their products and services, resulting in a rising amount of available time series data, making techniques to extract usable information needed. One state-of-the-art technique for time series is the Matrix Profile, which has been used for various applications including motif/discord discovery, visualizations and semantic segmentation. Internally, the Matrix Profile utilizes the z-normalized Euclidean distance to compare the shape of subsequences between two series. However, when comparing subsequences that are relatively flat and contain noise, the resulting distance is high despite the visual similarity of these subsequences. This property violates some of the assumptions made by Matrix Profile based techniques, resulting in worse performance when series contain flat and noisy subsequences. By studying the properties of the z-normalized Euclidean distance, we derived a method to eliminate this effect requiring only an estimate of the standard deviation of the noise. In this paper we describe various practical properties of the z-normalized Euclidean distance and show how these can be used to correct the performance of Matrix Profile related techniques. We demonstrate our techniques using anomaly detection using a Yahoo! Webscope anomaly dataset, semantic segmentation on the PAMAP2 activity dataset and for data visualization on a UCI activity dataset, all containing real-world data, and obtain overall better results after applying our technique. Our technique is a straightforward extension of the distance calculation in the Matrix Profile and will benefit any derived technique dealing with time series containing flat and noisy subsequences

    A study of time series: anomaly detection and trend prediction.

    Get PDF
    Leung Tat Wing.Thesis (M.Phil.)--Chinese University of Hong Kong, 2006.Includes bibliographical references (leaves 94-98).Abstracts in English and Chinese.Abstract --- p.iAcknowledgement --- p.ivChapter 1 --- Introduction --- p.1Chapter 1.1 --- Unusual Pattern Discovery --- p.3Chapter 1.2 --- Trend Prediction --- p.4Chapter 1.3 --- Thesis Organization --- p.5Chapter 2 --- Unusual Pattern Discovery --- p.6Chapter 2.1 --- Introduction --- p.6Chapter 2.2 --- Related Work --- p.7Chapter 2.2.1 --- Time Series Discords --- p.7Chapter 2.2.2 --- Brute Force Algorithm --- p.8Chapter 2.2.3 --- Keogh et al.'s Algorithm --- p.10Chapter 2.2.4 --- Performance Analysis --- p.14Chapter 2.3 --- Proposed Approach --- p.18Chapter 2.3.1 --- Haar Transform --- p.20Chapter 2.3.2 --- Discretization --- p.22Chapter 2.3.3 --- Augmented Trie --- p.24Chapter 2.3.4 --- Approximating the Magic Outer Loop --- p.27Chapter 2.3.5 --- Approximating the Magic Inner Loop --- p.28Chapter 2.3.6 --- Experimental Result --- p.28Chapter 2.4 --- More on discord length --- p.42Chapter 2.4.1 --- Modified Haar Transform --- p.42Chapter 2.4.2 --- Fast Haar Transform Algorithm --- p.43Chapter 2.4.3 --- Relation between discord length and discord location --- p.45Chapter 2.5 --- Further Optimization --- p.47Chapter 2.5.1 --- Improved Inner Loop Heuristic --- p.50Chapter 2.5.2 --- Experimental Result --- p.52Chapter 2.6 --- Top K discords --- p.53Chapter 2.6.1 --- Utility of top K discords --- p.53Chapter 2.6.2 --- Algorithm --- p.58Chapter 2.6.3 --- Experimental Result --- p.62Chapter 2.7 --- Conclusion --- p.64Chapter 3 --- Trend Prediction --- p.69Chapter 3.1 --- Introduction --- p.69Chapter 3.2 --- Technical Analysis --- p.70Chapter 3.2.1 --- Relative Strength Index --- p.70Chapter 3.2.2 --- Chart Analysis --- p.70Chapter 3.2.3 --- Dow Theory --- p.71Chapter 3.2.4 --- Moving Average --- p.72Chapter 3.3 --- Proposed Algorithm --- p.79Chapter 3.3.1 --- Piecewise Linear Representation --- p.80Chapter 3.3.2 --- Prediction Tree --- p.82Chapter 3.3.3 --- Trend Prediction --- p.84Chapter 3.4 --- Experimental Results --- p.86Chapter 3.4.1 --- Experimental setup --- p.86Chapter 3.4.2 --- Experiment on accuracy --- p.87Chapter 3.4.3 --- Experiment on performance --- p.88Chapter 3.5 --- Conclusion --- p.90Chapter 4 --- Conclusion --- p.92Bibliography --- p.9

    FLAGS : a methodology for adaptive anomaly detection and root cause analysis on sensor data streams by fusing expert knowledge with machine learning

    Get PDF
    Anomalies and faults can be detected, and their causes verified, using both data-driven and knowledge-driven techniques. Data-driven techniques can adapt their internal functioning based on the raw input data but fail to explain the manifestation of any detection. Knowledge-driven techniques inherently deliver the cause of the faults that were detected but require too much human effort to set up. In this paper, we introduce FLAGS, the Fused-AI interpretabLe Anomaly Generation System, and combine both techniques in one methodology to overcome their limitations and optimize them based on limited user feedback. Semantic knowledge is incorporated in a machine learning technique to enhance expressivity. At the same time, feedback about the faults and anomalies that occurred is provided as input to increase adaptiveness using semantic rule mining methods. This new methodology is evaluated on a predictive maintenance case for trains. We show that our method reduces their downtime and provides more insight into frequently occurring problems. (C) 2020 The Authors. Published by Elsevier B.V

    Contributions to time series data mining towards the detection of outliers/anomalies

    Get PDF
    148 p.Los recientes avances tecnológicos han supuesto un gran progreso en la recogida de datos, permitiendo recopilar una gran cantidad de datos a lo largo del tiempo. Estos datos se presentan comúnmente en forma de series temporales, donde las observaciones se han registrado de forma cronológica y están correlacionadas en el tiempo. A menudo, estas dependencias temporales contienen información significativa y útil, por lo que, en los últimos años, ha surgido un gran interés por extraer dicha información. En particular, el área de investigación que se centra en esta tarea se denomina minería de datos de series temporales.La comunidad de investigadores de esta área se ha dedicado a resolver diferentes tareas como por ejemplo la clasificación, la predicción, el clustering o agrupamiento y la detección de valores atípicos/anomalías. Los valores atípicos o anomalías son aquellas observaciones que no siguen el comportamiento esperado en una serie temporal. Estos valores atípicos o anómalos suelen representar mediciones no deseadas o eventos de interés, y, por lo tanto, detectarlos suele ser relevante ya que pueden empeorar la calidad de los datos o reflejar fenómenos interesantes para el analista.Esta tesis presenta varias contribuciones en el campo de la minería de datos de series temporales, más específicamente sobre la detección de valores atípicos o anomalías. Estas contribuciones se pueden dividir en dos partes o bloques. Por una parte, la tesis presenta contribuciones en el campo de la detección de valores atípicos o anomalías en series temporales. Para ello, se ofrece una revisión de las técnicas en la literatura, y se presenta una nueva técnica de detección de anomalías en series temporales univariantes para la detección de fugas de agua, basada en el aprendizaje autosupervisado. Por otra parte, la tesis también introduce contribuciones relacionadas con el tratamiento de las series temporales con valores perdidos y demuestra su aplicabilidad en el campo de la detección de anomalías
    • …
    corecore