84 research outputs found
Ensembles of Randomized Time Series Shapelets Provide Improved Accuracy while Reducing Computational Costs
Shapelets are discriminative time series subsequences that allow generation
of interpretable classification models, which provide faster and generally
better classification than the nearest neighbor approach. However, the shapelet
discovery process requires the evaluation of all possible subsequences of all
time series in the training set, making it extremely computation intensive.
Consequently, shapelet discovery for large time series datasets quickly becomes
intractable. A number of improvements have been proposed to reduce the training
time. These techniques use approximation or discretization and often lead to
reduced classification accuracy compared to the exact method.
We are proposing the use of ensembles of shapelet-based classifiers obtained
using random sampling of the shapelet candidates. Using random sampling reduces
the number of evaluated candidates and consequently the required computational
cost, while the classification accuracy of the resulting models is also not
significantly different than that of the exact algorithm. The combination of
randomized classifiers rectifies the inaccuracies of individual models because
of the diversity of the solutions. Based on the experiments performed, it is
shown that the proposed approach of using an ensemble of inexpensive
classifiers provides better classification accuracy compared to the exact
method at a significantly lesser computational cost
Benchmarking Multivariate Time Series Classification Algorithms
Time Series Classification (TSC) involved building predictive models for a
discrete target variable from ordered, real valued, attributes. Over recent
years, a new set of TSC algorithms have been developed which have made
significant improvement over the previous state of the art. The main focus has
been on univariate TSC, i.e. the problem where each case has a single series
and a class label. In reality, it is more common to encounter multivariate TSC
(MTSC) problems where multiple series are associated with a single label.
Despite this, much less consideration has been given to MTSC than the
univariate case. The UEA archive of 30 MTSC problems released in 2018 has made
comparison of algorithms easier. We review recently proposed bespoke MTSC
algorithms based on deep learning, shapelets and bag of words approaches. The
simplest approach to MTSC is to ensemble univariate classifiers over the
multivariate dimensions. We compare the bespoke algorithms to these dimension
independent approaches on the 26 of the 30 MTSC archive problems where the data
are all of equal length. We demonstrate that the independent ensemble of
HIVE-COTE classifiers is the most accurate, but that, unlike with univariate
classification, dynamic time warping is still competitive at MTSC.Comment: Data Min Knowl Disc (2020
Comparative Study of Machine Learning Models on Solar Flare Prediction Problem
Solar flare events are explosions of energy and radiation from the Sun’s surface. These events occur due to the tangling and twisting of magnetic fields associated with sunspots. When Coronal Mass ejections accompany solar flares, solar storms could travel towards earth at very high speeds, disrupting all earthly technologies and posing radiation hazards to astronauts. For this reason, the prediction of solar flares has become a crucial aspect of forecasting space weather. Our thesis utilized the time-series data consisting of active solar region magnetic field parameters acquired from SDO that span more than eight years. The classification models take AR data from an observation period of 12 hours as input to predict the occurrence of flare in next 24 hours. We performed preprocessing and feature selection to find optimal feature space consisting of 28 active region parameters that made our multivariate time series dataset (MVTS). For the first time, we modeled the flare prediction task as a 4-class problem and explored a comprehensive set of machine learning models to identify the most suitable model. This research achieved a state-of-the-art true skill statistic (TSS) of 0.92 with a 99.9% recall of X-/M- class flares on our time series forest model. This was accomplished with the augmented dataset in which the minority class is over-sampled using synthetic samples generated by SMOTE and the majority classes are randomly under-sampled. This work has established a robust dataset and baseline models for future studies in this task, including experiments on remedies to tackle the class imbalance problem such as weighted cost functions and data augmentation. Also the time series classifiers implemented will enable shapelets mining that can provide interpreting ability to domain experts
Random Dilated Shapelet Transform: A New Approach for Time Series Shapelets
Shapelet-based algorithms are widely used for time series classification
because of their ease of interpretation, but they are currently outperformed by
recent state-of-the-art approaches. We present a new formulation of time series
shapelets including the notion of dilation, and we introduce a new shapelet
feature to enhance their discriminative power for classification. Experiments
performed on 112 datasets show that our method improves on the state-of-the-art
shapelet algorithm, and achieves comparable accuracy to recent state-of-the-art
approaches, without sacrificing neither scalability, nor interpretability
- …