224 research outputs found
Shapelet Transforms for Univariate and Multivariate Time Series Classification
Time Series Classification (TSC) is a growing field of machine learning research. One particular algorithm from the TSC literature is the Shapelet Transform (ST). Shapelets are a phase independent subsequences that are extracted from times series to form discriminatory features. It has been shown that using the shapelets to transform the datasets into a new space can improve performance. One of the major problems with ST, is that the algorithm is O(n2m4), where n is the number of time series and m is the length of the series. As a problem increases in sizes, or additional dimensions are added, the algorithm quickly becomes computationally infeasible.
The research question addressed is whether the shapelet transform be improved in terms of accuracy and speed. Making algorithmic improvements to shapelets will enable the development of multivariate shapelet algorithms that can attempt to solve much larger problems in realistic time frames.
In support of this thesis a new distance early abandon method is proposed. A class balancing algorithm is implemented, which uses a one vs. all multi class information gain that enables heuristics which were developed for two class problems. To support these improvements a large scale analysis of the best shapelet algorithms is conducted as part of a larger experimental evaluation. ST is proven to be one of the most accurate algorithms in TSC on the UCR-UEA datasets. Contract classification is proposed for shapelets, where a fixed run time is set, and the number of shapelets is bounded. Four search algorithms are evaluated with fixed run times of one hour and one day, three of which are not significantly worse than a full enumeration. Finally, three multivariate shapelet algorithms are developed and compared to benchmark results and multivariate dynamic time warping
Generalised Interpretable Shapelets for Irregular Time Series
The shapelet transform is a form of feature extraction for time series, in
which a time series is described by its similarity to each of a collection of
`shapelets'. However it has previously suffered from a number of limitations,
such as being limited to regularly-spaced fully-observed time series, and
having to choose between efficient training and interpretability. Here, we
extend the method to continuous time, and in doing so handle the general case
of irregularly-sampled partially-observed multivariate time series.
Furthermore, we show that a simple regularisation penalty may be used to train
efficiently without sacrificing interpretability. The continuous-time
formulation additionally allows for learning the length of each shapelet
(previously a discrete object) in a differentiable manner. Finally, we
demonstrate that the measure of similarity between time series may be
generalised to a learnt pseudometric. We validate our method by demonstrating
its performance and interpretability on several datasets; for example we
discover (purely from data) that the digits 5 and 6 may be distinguished by the
chirality of their bottom loop, and that a kind of spectral gap exists in
spoken audio classification
Feature-based time-series analysis
This work presents an introduction to feature-based time-series analysis. The
time series as a data type is first described, along with an overview of the
interdisciplinary time-series analysis literature. I then summarize the range
of feature-based representations for time series that have been developed to
aid interpretable insights into time-series structure. Particular emphasis is
given to emerging research that facilitates wide comparison of feature-based
representations that allow us to understand the properties of a time-series
dataset that make it suited to a particular feature-based representation or
analysis algorithm. The future of time-series analysis is likely to embrace
approaches that exploit machine learning methods to partially automate human
learning to aid understanding of the complex dynamical patterns in the time
series we measure from the world.Comment: 28 pages, 9 figure
Multi-Sensor Event Detection using Shape Histograms
Vehicular sensor data consists of multiple time-series arising from a number
of sensors. Using such multi-sensor data we would like to detect occurrences of
specific events that vehicles encounter, e.g., corresponding to particular
maneuvers that a vehicle makes or conditions that it encounters. Events are
characterized by similar waveform patterns re-appearing within one or more
sensors. Further such patterns can be of variable duration. In this work, we
propose a method for detecting such events in time-series data using a novel
feature descriptor motivated by similar ideas in image processing. We define
the shape histogram: a constant dimension descriptor that nevertheless captures
patterns of variable duration. We demonstrate the efficacy of using shape
histograms as features to detect events in an SVM-based, multi-sensor,
supervised learning scenario, i.e., multiple time-series are used to detect an
event. We present results on real-life vehicular sensor data and show that our
technique performs better than available pattern detection implementations on
our data, and that it can also be used to combine features from multiple
sensors resulting in better accuracy than using any single sensor. Since
previous work on pattern detection in time-series has been in the single series
context, we also present results using our technique on multiple standard
time-series datasets and show that it is the most versatile in terms of how it
ranks compared to other published results
- …