94 research outputs found
Finding Motif Sets in Time Series
Time-series motifs are representative subsequences that occur frequently in a time series; a motif set is the set of subsequences deemed to be instances of a given motif. We focus on finding motif sets. Our motivation is to detect motif sets in household electricity-usage profiles, representing repeated patterns of household usage. We propose three algorithms for finding motif sets. Two are greedy algorithms based on pairwise comparison, and the third uses a heuristic measure of set quality to find the motif set directly. We compare these algorithms on simulated datasets and on electricity-usage data. We show that Scan MK, the simplest way of using the best-matching pair to find motif sets, is less accurate on our synthetic data than Set Finder and Cluster MK, although the latter is very sensitive to parameter settings. We qualitatively analyse the outputs for the electricity-usage data and demonstrate that both Scan MK and Set Finder can discover useful motif sets in such data
Multivariate time series classification with temporal abstractions
The increase in the number of complex temporal datasets collected today has prompted the development of methods that extend classical machine learning and data mining methods to time-series data. This work focuses on methods for multivariate time-series classification. Time series classification is a challenging problem mostly because the number of temporal features that describe the data and are potentially useful for classification is enormous. We study and develop a temporal abstraction framework for generating multivariate time series features suitable for classification tasks. We propose the STF-Mine algorithm that automatically mines discriminative temporal abstraction patterns from the time series data and uses them to learn a classification model. Our experimental evaluations, carried out on both synthetic and real world medical data, demonstrate the benefit of our approach in learning accurate classifiers for time-series datasets. Copyright © 2009, Assocation for the Advancement of ArtdicaI Intelligence (www.aaai.org). All rights reserved
Clustering Time Series from Mixture Polynomial Models with Discretised Data
Clustering time series is an active research area with applications in many fields. One common feature of time series is the likely presence of outliers. These uncharacteristic data can significantly effect the quality of clusters formed. This paper evaluates a method of over-coming the detrimental effects of outliers. We describe some of the alternative approaches to clustering time series, then specify a particular class of model for experimentation with k-means clustering and a correlation based distance metric. For data derived from this class of model we demonstrate that discretising the data into a binary series of above and below the median improves the clustering when the data has outliers. More specifically, we show that firstly discretisation does not significantly effect the accuracy of the clusters when there are no outliers and secondly it significantly increases the accuracy in the presence of outliers, even when the probability of outlier is very low
Toeplitz Inverse Covariance-Based Clustering of Multivariate Time Series Data
Subsequence clustering of multivariate time series is a useful tool for
discovering repeated patterns in temporal data. Once these patterns have been
discovered, seemingly complicated datasets can be interpreted as a temporal
sequence of only a small number of states, or clusters. For example, raw sensor
data from a fitness-tracking application can be expressed as a timeline of a
select few actions (i.e., walking, sitting, running). However, discovering
these patterns is challenging because it requires simultaneous segmentation and
clustering of the time series. Furthermore, interpreting the resulting clusters
is difficult, especially when the data is high-dimensional. Here we propose a
new method of model-based clustering, which we call Toeplitz Inverse
Covariance-based Clustering (TICC). Each cluster in the TICC method is defined
by a correlation network, or Markov random field (MRF), characterizing the
interdependencies between different observations in a typical subsequence of
that cluster. Based on this graphical representation, TICC simultaneously
segments and clusters the time series data. We solve the TICC problem through
alternating minimization, using a variation of the expectation maximization
(EM) algorithm. We derive closed-form solutions to efficiently solve the two
resulting subproblems in a scalable way, through dynamic programming and the
alternating direction method of multipliers (ADMM), respectively. We validate
our approach by comparing TICC to several state-of-the-art baselines in a
series of synthetic experiments, and we then demonstrate on an automobile
sensor dataset how TICC can be used to learn interpretable clusters in
real-world scenarios.Comment: This revised version fixes two small typos in the published versio
Recommended from our members
Multidimensional Time Series Fuzzy Association Rules Mining
In this paper, we present a new solution, in which the fuzziness of both subsequences and subsequences interval has been taken into consideration for solving the problem of multidimensional time series fuzzy association rules mining. Aimed at dealing with the new conception, this paper has put forward some key algorithms of the solution. Finally, an application example of multidimensional time series fuzzy association rules mining is illustrated. The result shows that rules with fuzzy interval can only be mined out by the above-mentioned new method
- …