Search CORE

1,726 research outputs found

Finding Motif Sets in Time Series

Author: Bagnall Anthony
Hills Jon
Lines Jason
Publication venue
Publication date: 01/01/2014
Field of study

Time-series motifs are representative subsequences that occur frequently in a time series; a motif set is the set of subsequences deemed to be instances of a given motif. We focus on finding motif sets. Our motivation is to detect motif sets in household electricity-usage profiles, representing repeated patterns of household usage. We propose three algorithms for finding motif sets. Two are greedy algorithms based on pairwise comparison, and the third uses a heuristic measure of set quality to find the motif set directly. We compare these algorithms on simulated datasets and on electricity-usage data. We show that Scan MK, the simplest way of using the best-matching pair to find motif sets, is less accurate on our synthetic data than Set Finder and Cluster MK, although the latter is very sensitive to parameter settings. We qualitatively analyse the outputs for the electricity-usage data and demonstrate that both Scan MK and Set Finder can discover useful motif sets in such data

arXiv.org e-Print Archive

University of East Anglia digital repository

Multiresolution motif discovery in time series

Author: Azevedo Paulo J.
Castro Nuno Constantino
Publication venue: 'Society for Industrial & Applied Mathematics (SIAM)'
Publication date: 01/01/2010
Field of study

Time series motif discovery is an important problem with applications in a variety of areas that range from telecommunications to medicine. Several algorithms have been proposed to solve the problem. However, these algorithms heavily use expensive random disk accesses or assume the data can't into main memory. They only consider motifs at a single resolution and are not suited to interactivity. In this work, we tackle the motif discovery problem as an approximate Top-K frequent subsequence discovery problem. We fully exploit state of the art iSAX representation multiresolution capability to obtain motifs at diferent resolutions. This property yields interactivity, allowing the user to navigate along the Top-K motifs structure. This permits a deeper understanding of the time series database. Further, we apply the Top-K space saving algorithm to our frequent subsequences approach. A scalable algorithm is obtained that is suitable for data stream like applications where small memory devices such as sensors are used. Our approach is scalable and disk-eficient since it only needs one single pass over the time series database. We provide empirical evidence of the validity of the algorithm in datasets from diferent areas that aim to represent practical applications.(undefined

CiteSeerX

Universidade do Minho: RepositoriUM

Crossref

Recommended from our members

Breaking Computational Barriers to Perform Time Series Pattern Mining at Scale and at the Edge

Author: Zimmerman Zachary Pierce
Publication venue: eScholarship, University of California
Publication date: 01/01/2019
Field of study

Uncovering repeated behavior in time series is an important problem in many domains such as medicine, geophysics, meteorology, and many more. With the continuing surge of smart/embedded devices generating time series data, there is an ever growing need to perform analysis on datasets of increasing size. Additionally, there is an increasing need for analysis at low power edge devices due to latency problems inherent to the speed of light and the sheer amount of data being recorded. The matrix profile has proven to be a tool highly suitable for pattern mining in time series; however, a naive approach to computing the matrix profile makes it impossible to use effectively in both the cloud and at the edge. This dissertation shows how, through the use of GPUs and machine learning, the matrix profile is computed more feasibly, both at cloud-scale and at sensor-scale. In addition, it illustrates why both of these types of computation are important and what new insights they can provide to practitioners working with time series data

eScholarship - University of California

Correlation Set Discovery on Time-Series Data

Author: Amagata Daichi
Hara Takahiro
Publication venue: Springer
Publication date
Field of study

Time-series data analysis is essential in many modern applications, such as financial markets, sensor networks, and data centers, and correlation discovery is a core technique for the analysis. In this paper, we address a novel problem that computes a k-sized time-series dataset where the minimum Pearson correlation of any two time-series in the set is maximized. This problem discovers a group of time-series, which are highly correlated with each other, from a given time-series dataset without any prior knowledge, thus helps many analytical applications. We show that this problem is NP-hard, and design an approximate heuristic solution that provides a high quality result with fast response time. Extensive experiments on real and synthetic datasets verify the efficiency, effectiveness, and scalability of our solution.This version of the contribution has been accepted for publication, after peer review (when applicable) but is not the Version of Record and does not reflect post-acceptance improvements, or any corrections. The Version of Record is available online at: https://doi.org/10.1007/978-3-030-27618-8_21

Osaka University Knowledge Archive

Peregrine: A Pattern-Aware Graph Mining System

Author: Ahmed Nesreen K.
Bearman Peter S.
Chen Hongzhi
Daniel
Dias Vinicius
Elseidy Mohammed
Gonzalez Joseph E.
Gonzalez Joseph E.
Hall Bronwyn
Han Wook-Shin
Hoang Loc
Iyer Anand Padmanabha
Jinghan
Joshua
Julian
Kankanamge Chathura
Kim Jinha
Korshunov Anton
Lai Longbin
Malewicz Grzegorz
Mawhirter Daniel
McSherry Frank
Meysman Pieter
Mugilan
Nguyen Donald
Pradeep
Semih
Serafini Marco
Song Qi
Teixeira Carlos H. C.
Ullmann Julian
Vora Keval
Vora Keval
Vora Keval
Wang Kai
Yuyi
Zhang Gensheng
Zhu Xiaowei
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 05/04/2020
Field of study

Graph mining workloads aim to extract structural properties of a graph by exploring its subgraph structures. General purpose graph mining systems provide a generic runtime to explore subgraph structures of interest with the help of user-defined functions that guide the overall exploration process. However, the state-of-the-art graph mining systems remain largely oblivious to the shape (or pattern) of the subgraphs that they mine. This causes them to: (a) explore unnecessary subgraphs; (b) perform expensive computations on the explored subgraphs; and, (c) hold intermediate partial subgraphs in memory; all of which affect their overall performance. Furthermore, their programming models are often tied to their underlying exploration strategies, which makes it difficult for domain users to express complex mining tasks. In this paper, we develop Peregrine, a pattern-aware graph mining system that directly explores the subgraphs of interest while avoiding exploration of unnecessary subgraphs, and simultaneously bypassing expensive computations throughout the mining process. We design a pattern-based programming model that treats "graph patterns" as first class constructs and enables Peregrine to extract the semantics of patterns, which it uses to guide its exploration. Our evaluation shows that Peregrine outperforms state-of-the-art distributed and single machine graph mining systems, and scales to complex mining tasks on larger graphs, while retaining simplicity and expressivity with its "pattern-first" programming approach.Comment: This is the full version of the paper appearing in the European Conference on Computer Systems (EuroSys), 202

arXiv.org e-Print Archive

Crossref

Loom: Query-aware Partitioning of Online Graphs

Author: Aiston Jack
Firth Hugo
Missier Paolo
Publication venue
Publication date: 17/11/2017
Field of study

As with general graph processing systems, partitioning data over a cluster of machines improves the scalability of graph database management systems. However, these systems will incur additional network cost during the execution of a query workload, due to inter-partition traversals. Workload-agnostic partitioning algorithms typically minimise the likelihood of any edge crossing partition boundaries. However, these partitioners are sub-optimal with respect to many workloads, especially queries, which may require more frequent traversal of specific subsets of inter-partition edges. Furthermore, they largely unsuited to operating incrementally on dynamic, growing graphs. We present a new graph partitioning algorithm, Loom, that operates on a stream of graph updates and continuously allocates the new vertices and edges to partitions, taking into account a query workload of graph pattern expressions along with their relative frequencies. First we capture the most common patterns of edge traversals which occur when executing queries. We then compare sub-graphs, which present themselves incrementally in the graph update stream, against these common patterns. Finally we attempt to allocate each match to single partitions, reducing the number of inter-partition edges within frequently traversed sub-graphs and improving average query performance. Loom is extensively evaluated over several large test graphs with realistic query workloads and various orderings of the graph updates. We demonstrate that, given a workload, our prototype produces partitionings of significantly better quality than existing streaming graph partitioning algorithms Fennel and LDG

arXiv.org e-Print Archive

University of Birmingham Research Portal

Discord Monitoring for Streaming Time-Series

Author: Amagata Daichi
Hara Takahiro
Kato Shinya
Nishio Shunya
Publication venue: Springer
Publication date
Field of study

Many applications generate time-series and analyze it. One of the most important time-series analysis tools is anomaly detection, and discord discovery aims at finding an anomaly subsequence in a time-series. Time-series is essentially dynamic, so monitoring the discord of a streaming time-series is an important problem. This paper addresses this problem and proposes SDM (Streaming Discord Monitoring), an algorithm that efficiently updates the discord of a streaming time-series over a sliding window. We show that SDM is approximation-friendly, i.e., the computational efficiency is accelerated by monitoring an approximate discord with theoretical bound. Our experiments on real datasets demonstrate the efficiency of SDM and its approximate version.This version of the contribution has been accepted for publication, after peer review (when applicable) but is not the Version of Record and does not reflect post-acceptance improvements, or any corrections. The Version of Record is available online at: https://doi.org/10.1007/978-3-030-27615-7_6. Use of this Accepted Version is subject to the publisher’s Accepted Manuscript terms of use https://www.springernature.com/gp/open-research/policies/accepted-manuscript-terms.Kato S., Amagata D., Nishio S., et al. Discord Monitoring for Streaming Time-Series. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 11706 LNCS, 79 (2019

Osaka University Knowledge Archive