9,974 research outputs found
PMP: Privacy-Aware Matrix Profile against Sensitive Pattern Inference
Recent rapid development of sensor technology has allowed massive fine-grained time series (TS) data to be collected and set the foundation for the development of data-driven services and applications. During the process, data sharing is often involved to allow the third-party modelers to perform specific time series data mining (TSDM) tasks based on the need of data owner. The high resolution of TS brings new challenges in protecting privacy. While meaningful information in high-resolution TS shifts from concrete point values to local shape-based segments, numerous research have found that long shape-based patterns could contain more sensitive information and may potentially be extracted and misused by a malicious third party. However, the privacy issue for TS patterns is surprisingly seldom explored in privacy-preserving literature. In this work, we consider a new privacy-preserving problem: preventing malicious inference on long shape-based patterns while preserving short segment information for the utility task performance. To mitigate the challenge, we investigate an alternative approach by sharing Matrix Profile (MP), which is a non-linear transformation of original data and a versatile data structure that supports many data mining tasks. We found that while MP can prevent concrete shape leakage, the canonical correlation in MP index can still reveal the location of sensitive long pattern. Based on this observation, we design two attacks named Location Attack and Entropy Attack to extract the pattern location from MP. To further protect MP from these two attacks, we propose a Privacy-Aware Matrix Profile (PMP) via perturbing the local correlation and breaking the canonical correlation in MP index vector. We evaluate our proposed PMP against baseline noise-adding methods through quantitative analysis and real-world case studies to show the effectiveness of the proposed method
Exploring time-series motifs through DTW-SOM
Motif discovery is a fundamental step in data mining tasks for time-series
data such as clustering, classification and anomaly detection. Even though many
papers have addressed the problem of how to find motifs in time-series by
proposing new motif discovery algorithms, not much work has been done on the
exploration of the motifs extracted by these algorithms. In this paper, we
argue that visually exploring time-series motifs computed by motif discovery
algorithms can be useful to understand and debug results. To explore the output
of motif discovery algorithms, we propose the use of an adapted Self-Organizing
Map, the DTW-SOM, on the list of motif's centers. In short, DTW-SOM is a
vanilla Self-Organizing Map with three main differences, namely (1) the use the
Dynamic Time Warping distance instead of the Euclidean distance, (2) the
adoption of two new network initialization routines (a random sample
initialization and an anchor initialization) and (3) the adjustment of the
Adaptation phase of the training to work with variable-length time-series
sequences. We test DTW-SOM in a synthetic motif dataset and two real
time-series datasets from the UCR Time Series Classification Archive. After an
exploration of results, we conclude that DTW-SOM is capable of extracting
relevant information from a set of motifs and display it in a visualization
that is space-efficient.Comment: 8 pages, 12 figures, Accepted for presentation at the International
Joint Conference on Neural Networks (IJCNN) 202
The Parallelism Motifs of Genomic Data Analysis
Genomic data sets are growing dramatically as the cost of sequencing
continues to decline and small sequencing devices become available. Enormous
community databases store and share this data with the research community, but
some of these genomic data analysis problems require large scale computational
platforms to meet both the memory and computational requirements. These
applications differ from scientific simulations that dominate the workload on
high end parallel systems today and place different requirements on programming
support, software libraries, and parallel architectural design. For example,
they involve irregular communication patterns such as asynchronous updates to
shared data structures. We consider several problems in high performance
genomics analysis, including alignment, profiling, clustering, and assembly for
both single genomes and metagenomes. We identify some of the common
computational patterns or motifs that help inform parallelization strategies
and compare our motifs to some of the established lists, arguing that at least
two key patterns, sorting and hashing, are missing
Exploring multiprocessor approaches to time series analysis
A time series is a chronologically ordered set of samples of a real-valued variable that can have millions of observations. Time series analysis seeks extracting models in a large variety of domains [31] such as epidemiology, DNA analysis, economics, geophysics, speech recognition, etc. Particularly, motif [4] (similarity) and discord [13] (anomaly) discovery has become one of the most frequently used primitives in time series data mining [20], [2], [32], [7], [34], [1]. It poses the problem of solving the all-pairs-similarity-search (also known as similarity join). Specifically, given a time series broken down into subsequences, retrieve the most similar subsequences (motifs) and the most different ones (discords).
One of the state-of-the-art methods for motif and discord discovery is Matrix Profile [35]. It solves the similarity join problem and allows time-manageable computation of very large time series. In this work, we focus on this technique, which features the possibility of detecting similarities, anomalies, and predicting outcomes. It provides full joins without the need for specifying a similarity threshold, which is a very challenging task in this domain. The matrix profile is another time series representing the minimum distance subsequence for each subsequence in the time series (motifs). Maximum distance values of the profile highlight the most dissimilar subsequences (discords).Funding for open Access charge: Universidad de Málaga / CBUA.
This work has been supported by the Spanish Government under projects PID2019-105396RB-I00 and PID2022-136575OB-I00
Recommended from our members
Breaking Computational Barriers to Perform Time Series Pattern Mining at Scale and at the Edge
Uncovering repeated behavior in time series is an important problem in many domains such as medicine, geophysics, meteorology, and many more. With the continuing surge of smart/embedded devices generating time series data, there is an ever growing need to perform analysis on datasets of increasing size. Additionally, there is an increasing need for analysis at low power edge devices due to latency problems inherent to the speed of light and the sheer amount of data being recorded. The matrix profile has proven to be a tool highly suitable for pattern mining in time series; however, a naive approach to computing the matrix profile makes it impossible to use effectively in both the cloud and at the edge. This dissertation shows how, through the use of GPUs and machine learning, the matrix profile is computed more feasibly, both at cloud-scale and at sensor-scale. In addition, it illustrates why both of these types of computation are important and what new insights they can provide to practitioners working with time series data
- …