298 research outputs found
Unsupervised Multivariate Time Series Clustering
Clustering is widely used in unsupervised machine learning to partition a given set of data into non-overlapping groups. Many real-world applications require processing more complex multivariate time series data characterized by more than one dependent variables. A few works in literature reported multivariate classification using Shapelet learning. However, the clustering of multivariate time series signals using Shapelet learning has not explored yet. Shapelet learning is a process of discovering those Shapelets which contain the most informative features of the time series signal. Discovering suitable Shapelets from many candidates Shapelet has been broadly studied for classification and clustering of univariate time series signals. Shapelet learning has shown promising results in the case of univariate time series analysis. The analysis of multivariate time series signals is not widely explored because of the dimensionality issue. This work proposes a generalized Shapelet learning method for unsupervised multivariate time series clustering. The proposed method utilizes spectral clustering and Shapelet similarity minimization with least square regularization to obtain the optimal Shapelets for unsupervised clustering. The proposed method is evaluated using an in-house multivariate time series dataset on detection of radio frequency (RF) faults in the Jefferson Labs Continuous Beam Accelerator Facility (CEBAF). The dataset constitutes of three-dimensional time series recordings of three RF fault types. The proposed method shows successful clustering performance with average value of a precision of 0.732, recall of 0.717, F-score of 0.732, a rand index (RI) score of 0.812 and normalize mutual information (NMI) of 0.56 with overall less than 3% standard deviation in a five-fold cross validation evaluation.https://digitalcommons.odu.edu/gradposters2021_engineering/1004/thumbnail.jp
Mining time-series data using discriminative subsequences
Time-series data is abundant, and must be analysed to extract usable knowledge. Local-shape-based methods offer improved performance for many problems, and a
comprehensible method of understanding both data and models.
For time-series classification, we transform the data into a local-shape space using a shapelet transform. A shapelet is a time-series subsequence that is discriminative
of the class of the original series. We use a heterogeneous ensemble classifier on the transformed data. The accuracy of our method is significantly better than the time-series classification benchmark (1-nearest-neighbour with dynamic time-warping distance), and significantly better than the previous best shapelet-based classifiers.
We use two methods to increase interpretability: First, we cluster the shapelets using a novel, parameterless clustering method based on Minimum Description Length,
reducing dimensionality and removing duplicate shapelets. Second, we transform the shapelet data into binary data reflecting the presence or absence of particular
shapelets, a representation that is straightforward to interpret and understand.
We supplement the ensemble classifier with partial classifocation. We generate rule sets on the binary-shapelet data, improving performance on certain classes, and revealing the relationship between the shapelets and the class label. To aid interpretability, we use a novel algorithm, BruteSuppression, that can substantially reduce
the size of a rule set without negatively affecting performance, leading to a more compact, comprehensible model.
Finally, we propose three novel algorithms for unsupervised mining of approximately repeated patterns in time-series data, testing their performance in terms of
speed and accuracy on synthetic data, and on a real-world electricity-consumption device-disambiguation problem. We show that individual devices can be found automatically
and in an unsupervised manner using a local-shape-based approach
Multi-Sensor Event Detection using Shape Histograms
Vehicular sensor data consists of multiple time-series arising from a number
of sensors. Using such multi-sensor data we would like to detect occurrences of
specific events that vehicles encounter, e.g., corresponding to particular
maneuvers that a vehicle makes or conditions that it encounters. Events are
characterized by similar waveform patterns re-appearing within one or more
sensors. Further such patterns can be of variable duration. In this work, we
propose a method for detecting such events in time-series data using a novel
feature descriptor motivated by similar ideas in image processing. We define
the shape histogram: a constant dimension descriptor that nevertheless captures
patterns of variable duration. We demonstrate the efficacy of using shape
histograms as features to detect events in an SVM-based, multi-sensor,
supervised learning scenario, i.e., multiple time-series are used to detect an
event. We present results on real-life vehicular sensor data and show that our
technique performs better than available pattern detection implementations on
our data, and that it can also be used to combine features from multiple
sensors resulting in better accuracy than using any single sensor. Since
previous work on pattern detection in time-series has been in the single series
context, we also present results using our technique on multiple standard
time-series datasets and show that it is the most versatile in terms of how it
ranks compared to other published results
SE-shapelets: Semi-supervised Clustering of Time Series Using Representative Shapelets
Shapelets that discriminate time series using local features (subsequences)
are promising for time series clustering. Existing time series clustering
methods may fail to capture representative shapelets because they discover
shapelets from a large pool of uninformative subsequences, and thus result in
low clustering accuracy. This paper proposes a Semi-supervised Clustering of
Time Series Using Representative Shapelets (SE-Shapelets) method, which
utilizes a small number of labeled and propagated pseudo-labeled time series to
help discover representative shapelets, thereby improving the clustering
accuracy. In SE-Shapelets, we propose two techniques to discover representative
shapelets for the effective clustering of time series. 1) A \textit{salient
subsequence chain} () that can extract salient subsequences (as candidate
shapelets) of a labeled/pseudo-labeled time series, which helps remove massive
uninformative subsequences from the pool. 2) A \textit{linear discriminant
selection} () algorithm to identify shapelets that can capture
representative local features of time series in different classes, for
convenient clustering. Experiments on UCR time series datasets demonstrate that
SE-shapelets discovers representative shapelets and achieves higher clustering
accuracy than counterpart semi-supervised time series clustering methods
- …