332 research outputs found

    HYPA: Efficient Detection of Path Anomalies in Time Series Data on Networks

    Full text link
    The unsupervised detection of anomalies in time series data has important applications in user behavioral modeling, fraud detection, and cybersecurity. Anomaly detection has, in fact, been extensively studied in categorical sequences. However, we often have access to time series data that represent paths through networks. Examples include transaction sequences in financial networks, click streams of users in networks of cross-referenced documents, or travel itineraries in transportation networks. To reliably detect anomalies, we must account for the fact that such data contain a large number of independent observations of paths constrained by a graph topology. Moreover, the heterogeneity of real systems rules out frequency-based anomaly detection techniques, which do not account for highly skewed edge and degree statistics. To address this problem, we introduce HYPA, a novel framework for the unsupervised detection of anomalies in large corpora of variable-length temporal paths in a graph. HYPA provides an efficient analytical method to detect paths with anomalous frequencies that result from nodes being traversed in unexpected chronological order.Comment: 11 pages with 8 figures and supplementary material. To appear at SIAM Data Mining (SDM 2020

    Rank Based Anomaly Detection Algorithms

    Get PDF
    Anomaly or outlier detection problems are of considerable importance, arising frequently in diverse real-world applications such as finance and cyber-security. Several algorithms have been formulated for such problems, usually based on formulating a problem-dependent heuristic or distance metric. This dissertation proposes anomaly detection algorithms that exploit the notion of ``rank, expressing relative outlierness of different points in the relevant space, and exploiting asymmetry in nearest neighbor relations between points: a data point is ``more anomalous if it is not the nearest neighbor of its nearest neighbors. Although rank is computed using distance, it is a more robust and higher level abstraction that is particularly helpful in problems characterized by significant variations of data point density, when distance alone is inadequate. We begin by proposing a rank-based outlier detection algorithm, and then discuss how this may be extended by also considering clustering-based approaches. We show that the use of rank significantly improves anomaly detection performance in a broad range of problems. We then consider the problem of identifying the most anomalous among a set of time series, e.g., the stock price of a company that exhibits significantly different behavior than its peer group of other companies. In such problems, different characteristics of time series are captured by different metrics, and we show that the best performance is obtained by combining several such metrics, along with the use of rank-based algorithms for anomaly detection. In practical scenarios, it is of interest to identify when a time series begins to diverge from the behavior of its peer group. We address this problem as well, using an online version of the anomaly detection algorithm developed earlier. Finally, we address the task of detecting the occurrence of anomalous sub-sequences within a single time series. This is accomplished by refining the multiple-distance combination approach, which succeeds when other algorithms (based on a single distance measure) fail. The algorithms developed in this dissertation can be applied in a large variety of application areas, and can assist in solving many practical problems

    A computational framework for unsupervised analysis of everyday human activities

    Get PDF
    In order to make computers proactive and assistive, we must enable them to perceive, learn, and predict what is happening in their surroundings. This presents us with the challenge of formalizing computational models of everyday human activities. For a majority of environments, the structure of the in situ activities is generally not known a priori. This thesis therefore investigates knowledge representations and manipulation techniques that can facilitate learning of such everyday human activities in a minimally supervised manner. A key step towards this end is finding appropriate representations for human activities. We posit that if we chose to describe activities as finite sequences of an appropriate set of events, then the global structure of these activities can be uniquely encoded using their local event sub-sequences. With this perspective at hand, we particularly investigate representations that characterize activities in terms of their fixed and variable length event subsequences. We comparatively analyze these representations in terms of their representational scope, feature cardinality and noise sensitivity. Exploiting such representations, we propose a computational framework to discover the various activity-classes taking place in an environment. We model these activity-classes as maximally similar activity-cliques in a completely connected graph of activities, and describe how to discover them efficiently. Moreover, we propose methods for finding concise characterizations of these discovered activity-classes, both from a holistic as well as a by-parts perspective. Using such characterizations, we present an incremental method to classify a new activity instance to one of the discovered activity-classes, and to automatically detect if it is anomalous with respect to the general characteristics of its membership class. Our results show the efficacy of our framework in a variety of everyday environments.Ph.D.Committee Chair: Aaron Bobick; Committee Member: Charles Isbell; Committee Member: David Hogg; Committee Member: Irfan Essa; Committee Member: James Reh

    A Review on Outlier/Anomaly Detection in Time Series Data

    Get PDF
    Recent advances in technology have brought major breakthroughs in data collection, enabling a large amount of data to be gathered over time and thus generating time series. Mining this data has become an important task for researchers and practitioners in the past few years, including the detection of outliers or anomalies that may represent errors or events of interest. This review aims to provide a structured and comprehensive state-of-the-art on outlier detection techniques in the context of time series. To this end, a taxonomy is presented based on the main aspects that characterize an outlier detection technique.KK/2019-00095 IT1244-19 TIN2016-78365-R PID2019-104966GB-I0
    • …
    corecore