65 research outputs found
Classification of time series by shapelet transformation
Time-series classification (TSC) problems present a specific challenge for classification algorithms: how to measure similarity between series. A \emph{shapelet} is a time-series subsequence that allows for TSC based on local, phase-independent similarity in shape. Shapelet-based classification uses the similarity between a shapelet and a series as a discriminatory feature. One benefit of the shapelet approach is that shapelets are comprehensible, and can offer insight into the problem domain. The original shapelet-based classifier embeds the shapelet-discovery algorithm in a decision tree, and uses information gain to assess the quality of candidates, finding a new shapelet at each node of the tree through an enumerative search. Subsequent research has focused mainly on techniques to speed up the search. We examine how best to use the shapelet primitive to construct classifiers. We propose a single-scan shapelet algorithm that finds the best shapelets, which are used to produce a transformed dataset, where each of the features represent the distance between a time series and a shapelet. The primary advantages over the embedded approach are that the transformed data can be used in conjunction with any classifier, and that there is no recursive search for shapelets. We demonstrate that the transformed data, in conjunction with more complex classifiers, gives greater accuracy than the embedded shapelet tree. We also evaluate three similarity measures that produce equivalent results to information gain in less time. Finally, we show that by conducting post-transform clustering of shapelets, we can enhance the interpretability of the transformed data. We conduct our experiments on 29 datasets: 17 from the UCR repository, and 12 we provide ourselve
Contrastive Shapelet Learning for Unsupervised Multivariate Time Series Representation Learning
Recent studies have shown great promise in unsupervised representation
learning (URL) for multivariate time series, because URL has the capability in
learning generalizable representation for many downstream tasks without using
inaccessible labels. However, existing approaches usually adopt the models
originally designed for other domains (e.g., computer vision) to encode the
time series data and rely on strong assumptions to design learning objectives,
which limits their ability to perform well. To deal with these problems, we
propose a novel URL framework for multivariate time series by learning
time-series-specific shapelet-based representation through a popular
contrasting learning paradigm. To the best of our knowledge, this is the first
work that explores the shapelet-based embedding in the unsupervised
general-purpose representation learning. A unified shapelet-based encoder and a
novel learning objective with multi-grained contrasting and multi-scale
alignment are particularly designed to achieve our goal, and a data
augmentation library is employed to improve the generalization. We conduct
extensive experiments using tens of real-world datasets to assess the
representation quality on many downstream tasks, including classification,
clustering, and anomaly detection. The results demonstrate the superiority of
our method against not only URL competitors, but also techniques specially
designed for downstream tasks. Our code has been made publicly available at
https://github.com/real2fish/CSL
- …