520 research outputs found
PMP: Privacy-Aware Matrix Profile against Sensitive Pattern Inference
Recent rapid development of sensor technology has allowed massive fine-grained time series (TS) data to be collected and set the foundation for the development of data-driven services and applications. During the process, data sharing is often involved to allow the third-party modelers to perform specific time series data mining (TSDM) tasks based on the need of data owner. The high resolution of TS brings new challenges in protecting privacy. While meaningful information in high-resolution TS shifts from concrete point values to local shape-based segments, numerous research have found that long shape-based patterns could contain more sensitive information and may potentially be extracted and misused by a malicious third party. However, the privacy issue for TS patterns is surprisingly seldom explored in privacy-preserving literature. In this work, we consider a new privacy-preserving problem: preventing malicious inference on long shape-based patterns while preserving short segment information for the utility task performance. To mitigate the challenge, we investigate an alternative approach by sharing Matrix Profile (MP), which is a non-linear transformation of original data and a versatile data structure that supports many data mining tasks. We found that while MP can prevent concrete shape leakage, the canonical correlation in MP index can still reveal the location of sensitive long pattern. Based on this observation, we design two attacks named Location Attack and Entropy Attack to extract the pattern location from MP. To further protect MP from these two attacks, we propose a Privacy-Aware Matrix Profile (PMP) via perturbing the local correlation and breaking the canonical correlation in MP index vector. We evaluate our proposed PMP against baseline noise-adding methods through quantitative analysis and real-world case studies to show the effectiveness of the proposed method
Robust Time Series Chain Discovery with Incremental Nearest Neighbors
Time series motif discovery has been a fundamental task to identify meaningful repeated patterns in time series. Recently, time series chains were introduced as an expansion of time series motifs to identify the continuous evolving patterns in time series data. Informally, a time series chain (TSC) is a temporally ordered set of time series subsequences, in which every subsequence is similar to the one that precedes it, but the last and the first can be arbitrarily dissimilar. TSCs are shown to be able to reveal latent continuous evolving trends in the time series, and identify precursors of unusual events in complex systems. Despite its promising interpretability, unfortunately, we have observed that existing TSC definitions lack the ability to accurately cover the evolving part of a time series: the discovered chains can be easily cut by noise and can include non-evolving patterns, making them impractical in real-world applications. Inspired by a recent work that tracks how the nearest neighbor of a time series subsequence changes over time, we introduce a new TSC definition which is much more robust to noise in the data, in the sense that they can better locate the evolving patterns while excluding the non-evolving ones. We further propose two new quality metrics to rank the discovered chains. With extensive empirical evaluations, we demonstrate that the proposed TSC definition is significantly more robust to noise than the state of the art, and the top ranked chains discovered can reveal meaningful regularities in a variety of real world datasets
Robust Time Series Chain Discovery with Incremental Nearest Neighbors
Time series motif discovery has been a fundamental task to identify
meaningful repeated patterns in time series. Recently, time series chains were
introduced as an expansion of time series motifs to identify the continuous
evolving patterns in time series data. Informally, a time series chain (TSC) is
a temporally ordered set of time series subsequences, in which every
subsequence is similar to the one that precedes it, but the last and the first
can be arbitrarily dissimilar. TSCs are shown to be able to reveal latent
continuous evolving trends in the time series, and identify precursors of
unusual events in complex systems. Despite its promising interpretability,
unfortunately, we have observed that existing TSC definitions lack the ability
to accurately cover the evolving part of a time series: the discovered chains
can be easily cut by noise and can include non-evolving patterns, making them
impractical in real-world applications. Inspired by a recent work that tracks
how the nearest neighbor of a time series subsequence changes over time, we
introduce a new TSC definition which is much more robust to noise in the data,
in the sense that they can better locate the evolving patterns while excluding
the non-evolving ones. We further propose two new quality metrics to rank the
discovered chains. With extensive empirical evaluations, we demonstrate that
the proposed TSC definition is significantly more robust to noise than the
state of the art, and the top ranked chains discovered can reveal meaningful
regularities in a variety of real world datasets.Comment: Accepted to ICDM 2022. This is an extended version of the pape
Infollution (Information Pollution) Management, Filtering Strategy, Scalable Workforce, and Organizational Learning: A Conceptual Study
Information generation is increasing rapidly on a global scale. The exponential advancement in information technology and communication has accentuated the problem of effective information management. Yet, employees’ cognitive ability to process information has not increased in parallel with information generation. With the exponential rise of information, information pollution (infollution) emerges as a problem on an exponential basis. Infollution is among the greatest challenges of the 21st century. Nevertheless, based on information processing theory and dynamic capability, researchers have conceptualised that agile organisations can cope with information pollution by promoting scalable workforce and organisational learning. By employing coping strategies, filtering has been hypothesised as moderating the association of scalable workplace and organisational learning with infollution management. This research will extend the literature in the domain of information management and agile organisations. It will be particularly useful for information processors to identify quality information for improved decision-making. 
A Fast Algorithm to Compute Maximum k-Plexes in Social Network Analysis
A clique model is one of the most important techniques on the cohesive subgraph detection; however, its applications are rather limited due to restrictive conditions of the model. Hence much research resorts to k-plex — a graph in which any vertex is adjacent to all but at most k vertices — which is a relaxation model of the clique. In this paper, we study the maximum k-plex problem and propose a fast algorithm to compute maximum k-plexes by exploiting structural properties of the problem. In an n-vertex graph, the algorithm computes optimal solutions in cnnO(1) time for a constant c < 2 depending only on k. To the best of our knowledge, this is the first algorithm that breaks the trivial theoretical bound of 2n for each k ≥ 3. We also provide experimental results over multiple real-world social network instances in support
- …