408 research outputs found

    A Review on Outlier/Anomaly Detection in Time Series Data

    Get PDF
    Recent advances in technology have brought major breakthroughs in data collection, enabling a large amount of data to be gathered over time and thus generating time series. Mining this data has become an important task for researchers and practitioners in the past few years, including the detection of outliers or anomalies that may represent errors or events of interest. This review aims to provide a structured and comprehensive state-of-the-art on outlier detection techniques in the context of time series. To this end, a taxonomy is presented based on the main aspects that characterize an outlier detection technique.KK/2019-00095 IT1244-19 TIN2016-78365-R PID2019-104966GB-I0

    Detecting anomalies in sequential data augmented with new features

    Get PDF

    Contributions to time series data mining towards the detection of outliers/anomalies

    Get PDF
    148 p.Los recientes avances tecnológicos han supuesto un gran progreso en la recogida de datos, permitiendo recopilar una gran cantidad de datos a lo largo del tiempo. Estos datos se presentan comúnmente en forma de series temporales, donde las observaciones se han registrado de forma cronológica y están correlacionadas en el tiempo. A menudo, estas dependencias temporales contienen información significativa y útil, por lo que, en los últimos años, ha surgido un gran interés por extraer dicha información. En particular, el área de investigación que se centra en esta tarea se denomina minería de datos de series temporales.La comunidad de investigadores de esta área se ha dedicado a resolver diferentes tareas como por ejemplo la clasificación, la predicción, el clustering o agrupamiento y la detección de valores atípicos/anomalías. Los valores atípicos o anomalías son aquellas observaciones que no siguen el comportamiento esperado en una serie temporal. Estos valores atípicos o anómalos suelen representar mediciones no deseadas o eventos de interés, y, por lo tanto, detectarlos suele ser relevante ya que pueden empeorar la calidad de los datos o reflejar fenómenos interesantes para el analista.Esta tesis presenta varias contribuciones en el campo de la minería de datos de series temporales, más específicamente sobre la detección de valores atípicos o anomalías. Estas contribuciones se pueden dividir en dos partes o bloques. Por una parte, la tesis presenta contribuciones en el campo de la detección de valores atípicos o anomalías en series temporales. Para ello, se ofrece una revisión de las técnicas en la literatura, y se presenta una nueva técnica de detección de anomalías en series temporales univariantes para la detección de fugas de agua, basada en el aprendizaje autosupervisado. Por otra parte, la tesis también introduce contribuciones relacionadas con el tratamiento de las series temporales con valores perdidos y demuestra su aplicabilidad en el campo de la detección de anomalías

    A Review of Subsequence Time Series Clustering

    Get PDF
    Clustering of subsequence time series remains an open issue in time series clustering. Subsequence time series clustering is used in different fields, such as e-commerce, outlier detection, speech recognition, biological systems, DNA recognition, and text mining. One of the useful fields in the domain of subsequence time series clustering is pattern recognition. To improve this field, a sequence of time series data is used. This paper reviews some definitions and backgrounds related to subsequence time series clustering. The categorization of the literature reviews is divided into three groups: preproof, interproof, and postproof period. Moreover, various state-of-the-art approaches in performing subsequence time series clustering are discussed under each of the following categories. The strengths and weaknesses of the employed methods are evaluated as potential issues for future studies

    Algorithms for the automated correction of vertical drift in eye-tracking data

    Get PDF
    A common problem in eye tracking research is vertical drift\u2014the progressive displacement of fixation registrations on the vertical axis that results from a gradual loss of eye tracker calibration over time. This is particularly problematic in experiments that involve the reading of multiline passages, where it is critical that fixations on one line are not erroneously recorded on an adjacent line. Correction is often performed manually by the researcher, but this process is tedious, time-consuming, and prone to error and inconsistency. Various methods have previously been proposed for the automated, post-hoc correction of vertical drift in reading data, but these methods vary greatly, not just in terms of the algorithmic principles on which they are based, but also in terms of their availability, documentation, implementation languages, and so forth. Furthermore, these methods have largely been developed in isolation with little attempt to systematically evaluate them, meaning that drift correction techniques are moving forward blindly. We document ten major algorithms, including two that are novel to this paper, and evaluate them using both simulated and natural eye tracking data. Our results suggest that a method based on dynamic time warping offers great promise, but we also find that some algorithms are better suited than others to particular types of drift phenomena and reading behavior, allowing us to offer evidence-based advice on algorithm selection

    Rank Based Anomaly Detection Algorithms

    Get PDF
    Anomaly or outlier detection problems are of considerable importance, arising frequently in diverse real-world applications such as finance and cyber-security. Several algorithms have been formulated for such problems, usually based on formulating a problem-dependent heuristic or distance metric. This dissertation proposes anomaly detection algorithms that exploit the notion of ``rank, expressing relative outlierness of different points in the relevant space, and exploiting asymmetry in nearest neighbor relations between points: a data point is ``more anomalous if it is not the nearest neighbor of its nearest neighbors. Although rank is computed using distance, it is a more robust and higher level abstraction that is particularly helpful in problems characterized by significant variations of data point density, when distance alone is inadequate. We begin by proposing a rank-based outlier detection algorithm, and then discuss how this may be extended by also considering clustering-based approaches. We show that the use of rank significantly improves anomaly detection performance in a broad range of problems. We then consider the problem of identifying the most anomalous among a set of time series, e.g., the stock price of a company that exhibits significantly different behavior than its peer group of other companies. In such problems, different characteristics of time series are captured by different metrics, and we show that the best performance is obtained by combining several such metrics, along with the use of rank-based algorithms for anomaly detection. In practical scenarios, it is of interest to identify when a time series begins to diverge from the behavior of its peer group. We address this problem as well, using an online version of the anomaly detection algorithm developed earlier. Finally, we address the task of detecting the occurrence of anomalous sub-sequences within a single time series. This is accomplished by refining the multiple-distance combination approach, which succeeds when other algorithms (based on a single distance measure) fail. The algorithms developed in this dissertation can be applied in a large variety of application areas, and can assist in solving many practical problems

    A study of time series: anomaly detection and trend prediction.

    Get PDF
    Leung Tat Wing.Thesis (M.Phil.)--Chinese University of Hong Kong, 2006.Includes bibliographical references (leaves 94-98).Abstracts in English and Chinese.Abstract --- p.iAcknowledgement --- p.ivChapter 1 --- Introduction --- p.1Chapter 1.1 --- Unusual Pattern Discovery --- p.3Chapter 1.2 --- Trend Prediction --- p.4Chapter 1.3 --- Thesis Organization --- p.5Chapter 2 --- Unusual Pattern Discovery --- p.6Chapter 2.1 --- Introduction --- p.6Chapter 2.2 --- Related Work --- p.7Chapter 2.2.1 --- Time Series Discords --- p.7Chapter 2.2.2 --- Brute Force Algorithm --- p.8Chapter 2.2.3 --- Keogh et al.'s Algorithm --- p.10Chapter 2.2.4 --- Performance Analysis --- p.14Chapter 2.3 --- Proposed Approach --- p.18Chapter 2.3.1 --- Haar Transform --- p.20Chapter 2.3.2 --- Discretization --- p.22Chapter 2.3.3 --- Augmented Trie --- p.24Chapter 2.3.4 --- Approximating the Magic Outer Loop --- p.27Chapter 2.3.5 --- Approximating the Magic Inner Loop --- p.28Chapter 2.3.6 --- Experimental Result --- p.28Chapter 2.4 --- More on discord length --- p.42Chapter 2.4.1 --- Modified Haar Transform --- p.42Chapter 2.4.2 --- Fast Haar Transform Algorithm --- p.43Chapter 2.4.3 --- Relation between discord length and discord location --- p.45Chapter 2.5 --- Further Optimization --- p.47Chapter 2.5.1 --- Improved Inner Loop Heuristic --- p.50Chapter 2.5.2 --- Experimental Result --- p.52Chapter 2.6 --- Top K discords --- p.53Chapter 2.6.1 --- Utility of top K discords --- p.53Chapter 2.6.2 --- Algorithm --- p.58Chapter 2.6.3 --- Experimental Result --- p.62Chapter 2.7 --- Conclusion --- p.64Chapter 3 --- Trend Prediction --- p.69Chapter 3.1 --- Introduction --- p.69Chapter 3.2 --- Technical Analysis --- p.70Chapter 3.2.1 --- Relative Strength Index --- p.70Chapter 3.2.2 --- Chart Analysis --- p.70Chapter 3.2.3 --- Dow Theory --- p.71Chapter 3.2.4 --- Moving Average --- p.72Chapter 3.3 --- Proposed Algorithm --- p.79Chapter 3.3.1 --- Piecewise Linear Representation --- p.80Chapter 3.3.2 --- Prediction Tree --- p.82Chapter 3.3.3 --- Trend Prediction --- p.84Chapter 3.4 --- Experimental Results --- p.86Chapter 3.4.1 --- Experimental setup --- p.86Chapter 3.4.2 --- Experiment on accuracy --- p.87Chapter 3.4.3 --- Experiment on performance --- p.88Chapter 3.5 --- Conclusion --- p.90Chapter 4 --- Conclusion --- p.92Bibliography --- p.9
    corecore