260 research outputs found
Implications of Z-normalization in the matrix profile
Companies are increasingly measuring their products and services, resulting in a rising amount of available time series data, making techniques to extract usable information needed. One state-of-the-art technique for time series is the Matrix Profile, which has been used for various applications including motif/discord discovery, visualizations and semantic segmentation. Internally, the Matrix Profile utilizes the z-normalized Euclidean distance to compare the shape of subsequences between two series. However, when comparing subsequences that are relatively flat and contain noise, the resulting distance is high despite the visual similarity of these subsequences. This property violates some of the assumptions made by Matrix Profile based techniques, resulting in worse performance when series contain flat and noisy subsequences. By studying the properties of the z-normalized Euclidean distance, we derived a method to eliminate this effect requiring only an estimate of the standard deviation of the noise. In this paper we describe various practical properties of the z-normalized Euclidean distance and show how these can be used to correct the performance of Matrix Profile related techniques. We demonstrate our techniques using anomaly detection using a Yahoo! Webscope anomaly dataset, semantic segmentation on the PAMAP2 activity dataset and for data visualization on a UCI activity dataset, all containing real-world data, and obtain overall better results after applying our technique. Our technique is a straightforward extension of the distance calculation in the Matrix Profile and will benefit any derived technique dealing with time series containing flat and noisy subsequences
A generalized matrix profile framework with support for contextual series analysis
The Matrix Profile is a state-of-the-art time series analysis technique that can be used for motif discovery, anomaly detection, segmentation and others, in various domains such as healthcare, robotics, and audio. Where recent techniques use the Matrix Profile as a preprocessing or modeling step, we believe there is unexplored potential in generalizing the approach. We derived a framework that focuses on the implicit distance matrix calculation. We present this framework as the Series Distance Matrix (SDM). In this framework, distance measures (SDM-generators) and distance processors (SDM-consumers) can be freely combined, allowing for more flexibility and easier experimentation. In SDM, the Matrix Profile is but one specific configuration. We also introduce the Contextual Matrix Profile (CMP) as a new SDM-consumer capable of discovering repeating patterns. The CMP provides intuitive visualizations for data analysis and can find anomalies that are not discords. We demonstrate this using two real world cases. The CMP is the first of a wide variety of new techniques for series analysis that fits within SDM and can complement the Matrix Profile
Discord Monitoring for Streaming Time-Series
Many applications generate time-series and analyze it. One of the most important time-series analysis tools is anomaly detection, and discord discovery aims at finding an anomaly subsequence in a time-series. Time-series is essentially dynamic, so monitoring the discord of a streaming time-series is an important problem. This paper addresses this problem and proposes SDM (Streaming Discord Monitoring), an algorithm that efficiently updates the discord of a streaming time-series over a sliding window. We show that SDM is approximation-friendly, i.e., the computational efficiency is accelerated by monitoring an approximate discord with theoretical bound. Our experiments on real datasets demonstrate the efficiency of SDM and its approximate version.This version of the contribution has been accepted for publication, after peer review (when applicable) but is not the Version of Record and does not reflect post-acceptance improvements, or any corrections. The Version of Record is available online at: https://doi.org/10.1007/978-3-030-27615-7_6. Use of this Accepted Version is subject to the publisher’s Accepted Manuscript terms of use https://www.springernature.com/gp/open-research/policies/accepted-manuscript-terms.Kato S., Amagata D., Nishio S., et al. Discord Monitoring for Streaming Time-Series. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 11706 LNCS, 79 (2019
Calculating the matrix profile from noisy data
The matrix profile (MP) is a data structure computed from a time series which
encodes the data required to locate motifs and discords, corresponding to
recurring patterns and outliers respectively. When the time series contains
noisy data then the conventional approach is to pre-filter it in order to
remove noise but this cannot apply in unsupervised settings where patterns and
outliers are not annotated. The resilience of the algorithm used to generate
the MP when faced with noisy data remains unknown. We measure the similarities
between the MP from original time series data with MPs generated from the same
data with noisy data added under a range of parameter settings including adding
duplicates and adding irrelevant data. We use three real world data sets drawn
from diverse domains for these experiments Based on dissimilarities between the
MPs, our results suggest that MP generation is resilient to a small amount of
noise being introduced into the data but as the amount of noise increases this
resilience disappearsComment: 16 page
ALDI++: Automatic and parameter-less discord and outlier detection for building energy load profiles
Data-driven building energy prediction is an integral part of the process for
measurement and verification, building benchmarking, and building-to-grid
interaction. The ASHRAE Great Energy Predictor III (GEPIII) machine learning
competition used an extensive meter data set to crowdsource the most accurate
machine learning workflow for whole building energy prediction. A significant
component of the winning solutions was the pre-processing phase to remove
anomalous training data. Contemporary pre-processing methods focus on filtering
statistical threshold values or deep learning methods requiring training data
and multiple hyper-parameters. A recent method named ALDI (Automated Load
profile Discord Identification) managed to identify these discords using matrix
profile, but the technique still requires user-defined parameters. We develop
ALDI++, a method based on the previous work that bypasses user-defined
parameters and takes advantage of discord similarity. We evaluate ALDI++
against a statistical threshold, variational auto-encoder, and the original
ALDI as baselines in classifying discords and energy forecasting scenarios. Our
results demonstrate that while the classification performance improvement over
the original method is marginal, ALDI++ helps achieve the best forecasting
error improving 6% over the winning's team approach with six times less
computation time.Comment: 10 pages, 5 figures, 3 table
FLAGS : a methodology for adaptive anomaly detection and root cause analysis on sensor data streams by fusing expert knowledge with machine learning
Anomalies and faults can be detected, and their causes verified, using both data-driven and knowledge-driven techniques. Data-driven techniques can adapt their internal functioning based on the raw input data but fail to explain the manifestation of any detection. Knowledge-driven techniques inherently deliver the cause of the faults that were detected but require too much human effort to set up. In this paper, we introduce FLAGS, the Fused-AI interpretabLe Anomaly Generation System, and combine both techniques in one methodology to overcome their limitations and optimize them based on limited user feedback. Semantic knowledge is incorporated in a machine learning technique to enhance expressivity. At the same time, feedback about the faults and anomalies that occurred is provided as input to increase adaptiveness using semantic rule mining methods. This new methodology is evaluated on a predictive maintenance case for trains. We show that our method reduces their downtime and provides more insight into frequently occurring problems. (C) 2020 The Authors. Published by Elsevier B.V
- …