19,901 research outputs found
Recommended from our members
Breaking Computational Barriers to Perform Time Series Pattern Mining at Scale and at the Edge
Uncovering repeated behavior in time series is an important problem in many domains such as medicine, geophysics, meteorology, and many more. With the continuing surge of smart/embedded devices generating time series data, there is an ever growing need to perform analysis on datasets of increasing size. Additionally, there is an increasing need for analysis at low power edge devices due to latency problems inherent to the speed of light and the sheer amount of data being recorded. The matrix profile has proven to be a tool highly suitable for pattern mining in time series; however, a naive approach to computing the matrix profile makes it impossible to use effectively in both the cloud and at the edge. This dissertation shows how, through the use of GPUs and machine learning, the matrix profile is computed more feasibly, both at cloud-scale and at sensor-scale. In addition, it illustrates why both of these types of computation are important and what new insights they can provide to practitioners working with time series data
PMP: Privacy-Aware Matrix Profile against Sensitive Pattern Inference
Recent rapid development of sensor technology has allowed massive fine-grained time series (TS) data to be collected and set the foundation for the development of data-driven services and applications. During the process, data sharing is often involved to allow the third-party modelers to perform specific time series data mining (TSDM) tasks based on the need of data owner. The high resolution of TS brings new challenges in protecting privacy. While meaningful information in high-resolution TS shifts from concrete point values to local shape-based segments, numerous research have found that long shape-based patterns could contain more sensitive information and may potentially be extracted and misused by a malicious third party. However, the privacy issue for TS patterns is surprisingly seldom explored in privacy-preserving literature. In this work, we consider a new privacy-preserving problem: preventing malicious inference on long shape-based patterns while preserving short segment information for the utility task performance. To mitigate the challenge, we investigate an alternative approach by sharing Matrix Profile (MP), which is a non-linear transformation of original data and a versatile data structure that supports many data mining tasks. We found that while MP can prevent concrete shape leakage, the canonical correlation in MP index can still reveal the location of sensitive long pattern. Based on this observation, we design two attacks named Location Attack and Entropy Attack to extract the pattern location from MP. To further protect MP from these two attacks, we propose a Privacy-Aware Matrix Profile (PMP) via perturbing the local correlation and breaking the canonical correlation in MP index vector. We evaluate our proposed PMP against baseline noise-adding methods through quantitative analysis and real-world case studies to show the effectiveness of the proposed method
Exploring time-series motifs through DTW-SOM
Motif discovery is a fundamental step in data mining tasks for time-series
data such as clustering, classification and anomaly detection. Even though many
papers have addressed the problem of how to find motifs in time-series by
proposing new motif discovery algorithms, not much work has been done on the
exploration of the motifs extracted by these algorithms. In this paper, we
argue that visually exploring time-series motifs computed by motif discovery
algorithms can be useful to understand and debug results. To explore the output
of motif discovery algorithms, we propose the use of an adapted Self-Organizing
Map, the DTW-SOM, on the list of motif's centers. In short, DTW-SOM is a
vanilla Self-Organizing Map with three main differences, namely (1) the use the
Dynamic Time Warping distance instead of the Euclidean distance, (2) the
adoption of two new network initialization routines (a random sample
initialization and an anchor initialization) and (3) the adjustment of the
Adaptation phase of the training to work with variable-length time-series
sequences. We test DTW-SOM in a synthetic motif dataset and two real
time-series datasets from the UCR Time Series Classification Archive. After an
exploration of results, we conclude that DTW-SOM is capable of extracting
relevant information from a set of motifs and display it in a visualization
that is space-efficient.Comment: 8 pages, 12 figures, Accepted for presentation at the International
Joint Conference on Neural Networks (IJCNN) 202
Regulatory motif discovery using a population clustering evolutionary algorithm
This paper describes a novel evolutionary algorithm for regulatory motif discovery in DNA promoter sequences. The algorithm uses data clustering to logically distribute the evolving population across the search space. Mating then takes place within local regions of the population, promoting overall solution diversity and encouraging discovery of multiple solutions. Experiments using synthetic data sets have demonstrated the algorithm's capacity to find position frequency matrix models of known regulatory motifs in relatively long promoter sequences. These experiments have also shown the algorithm's ability to maintain diversity during search and discover multiple motifs within a single population. The utility of the algorithm for discovering motifs in real biological data is demonstrated by its ability to find meaningful motifs within muscle-specific regulatory sequences
- …