78 research outputs found

    An Experimental Evaluation of Nearest Neighbour Time Series Classification

    Get PDF
    Data mining research into time series classification (TSC) has focussed on alternative distance measures for nearest neighbour classifiers. It is standard practice to use 1-NN with Euclidean or dynamic time warping (DTW) distance as a straw man for comparison. As part of a wider investigation into elastic distance measures for TSC~\cite{lines14elastic}, we perform a series of experiments to test whether this standard practice is valid. Specifically, we compare 1-NN classifiers with Euclidean and DTW distance to standard classifiers, examine whether the performance of 1-NN Euclidean approaches that of 1-NN DTW as the number of cases increases, assess whether there is any benefit of setting kk for kk-NN through cross validation whether it is worth setting the warping path for DTW through cross validation and finally is it better to use a window or weighting for DTW. Based on experiments on 77 problems, we conclude that 1-NN with Euclidean distance is fairly easy to beat but 1-NN with DTW is not, if window size is set through cross validation

    Finding Motif Sets in Time Series

    Get PDF
    Time-series motifs are representative subsequences that occur frequently in a time series; a motif set is the set of subsequences deemed to be instances of a given motif. We focus on finding motif sets. Our motivation is to detect motif sets in household electricity-usage profiles, representing repeated patterns of household usage. We propose three algorithms for finding motif sets. Two are greedy algorithms based on pairwise comparison, and the third uses a heuristic measure of set quality to find the motif set directly. We compare these algorithms on simulated datasets and on electricity-usage data. We show that Scan MK, the simplest way of using the best-matching pair to find motif sets, is less accurate on our synthetic data than Set Finder and Cluster MK, although the latter is very sensitive to parameter settings. We qualitatively analyse the outputs for the electricity-usage data and demonstrate that both Scan MK and Set Finder can discover useful motif sets in such data

    HIVE-COTE: The hierarchical vote collective of transformation-based ensembles for time series classification:IEEE International Conference on Data Mining

    Get PDF
    There have been many new algorithms proposed over the last five years for solving time series classification (TSC) problems. A recent experimental comparison of the leading TSC algorithms has demonstrated that one approach is significantly more accurate than all others over 85 datasets. That approach, the Flat Collective of Transformation-based Ensembles (Flat-COTE), achieves superior accuracy through combining predictions of 35 individual classifiers built on four representations of the data into a flat hierarchy. Outside of TSC, deep learning approaches such as convolutional neural networks (CNN) have seen a recent surge in popularity and are now state of the art in many fields. An obvious question is whether CNNs could be equally transformative in the field of TSC. To test this, we implement a common CNN structure and compare performance to Flat-COTE and a recently proposed time series-specific CNN implementation.We find that Flat-COTE is significantly more accurate than both deep learning approaches on 85 datasets. These results are impressive, but Flat-COTE is not without deficiencies. We improve the collective by adding new components and proposing a modular hierarchical structure with a probabilistic voting scheme that allows us to encapsulate the classifiers built on each transformation. We add two new modules representing dictionary and interval-based classifiers, and significantly improve upon the existing frequency domain classifiers with a novel spectral ensemble. The resulting classifier, the Hierarchical Vote Collective of Transformation-based Ensembles (HIVE-COTE) is significantly more accurate than Flat-COTE and represents a new state of the art for TSC. HIVE-COTE captures more sources of possible discriminatory features in time series and has a more modular, intuitive structure

    Time Series Classification with HIVE-COTE: The Hierarchical Vote Collective of Transformation-based Ensembles

    Get PDF
    A recent experimental evaluation assessed 19 time series classification (TSC) algorithms and found that one was significantly more accurate than all others: the Flat Collective of Transformation-based Ensembles (Flat-COTE). Flat-COTE is an ensemble that combines 35 classifiers over four data representations. However, while comprehensive, the evaluation did not consider deep learning approaches. Convolutional neural networks (CNN) have seen a surge in popularity and are now state of the art in many fields and raises the question of whether CNNs could be equally transformative for TSC. We implement a benchmark CNN for TSC using a common structure and use results from a TSC-specific CNN from the literature. We compare both to Flat-COTE and find that the collective is significantly more accurate than both CNNs. These results are impressive, but Flat-COTE is not without deficiencies. We significantly improve the collective by proposing a new hierarchical structure with probabilistic voting, defining and including two novel ensemble classifiers built in existing feature spaces, and adding further modules to represent two additional transformation domains. The resulting classifier, the Hierarchical Vote Collective of Transformation-based Ensembles (HIVE-COTE), encapsulates classifiers built on five data representations. We demonstrate that HIVE-COTE is significantly more accurate than Flat-COTE (and all other TSC algorithms that we are aware of) over 100 resamples of 85 TSC problems and is the new state of the art for TSC. Further analysis is included through the introduction and evaluation of 3 new case studies and extensive experimentation on 1000 simulated datasets of 5 different types

    Time Series classification through transformation and ensembles

    Get PDF
    The problem of time series classification (TSC), where we consider any real-valued ordered data a time series, offers a specific challenge. Unlike traditional classification problems, the ordering of attributes is often crucial for identifying discriminatory features between classes. TSC problems arise across a diverse range of domains, and this variety has meant that no single approach outperforms all others. The general consensus is that the benchmark for TSC is nearest neighbour (NN) classifiers using Euclidean distance or Dynamic Time Warping (DTW). Though conceptually simple, many have reported that NN classifiers are very diffi�cult to beat and new work is often compared to NN classifiers. The majority of approaches have focused on classification in the time domain, typically proposing alternative elastic similarity measures for NN classification. Other work has investigated more specialised approaches, such as building support vector machines on variable intervals and creating tree-based ensembles with summary measures. We wish to answer a specific research question: given a new TSC problem without any prior, specialised knowledge, what is the best way to approach the problem? Our thesis is that the best methodology is to first transform data into alternative representations where discriminatory features are more easily detected, and then build ensemble classifiers on each representation. In support of our thesis, we propose an elastic ensemble classifier that we believe is the first ever to significantly outperform DTW on the widely used UCR datasets. Next, we propose the shapelet-transform, a new data transformation that allows complex classifiers to be coupled with shapelets, which outperforms the original algorithm and is competitive with DTW. Finally, we combine these two works with with heterogeneous ensembles built on autocorrelation and spectral-transformed data to propose a collective of transformation-based ensembles (COTE). The results of COTE are, we believe, the best ever published on the UCR datasets

    Classification of time series by shapelet transformation

    Get PDF
    Time-series classification (TSC) problems present a specific challenge for classification algorithms: how to measure similarity between series. A \emph{shapelet} is a time-series subsequence that allows for TSC based on local, phase-independent similarity in shape. Shapelet-based classification uses the similarity between a shapelet and a series as a discriminatory feature. One benefit of the shapelet approach is that shapelets are comprehensible, and can offer insight into the problem domain. The original shapelet-based classifier embeds the shapelet-discovery algorithm in a decision tree, and uses information gain to assess the quality of candidates, finding a new shapelet at each node of the tree through an enumerative search. Subsequent research has focused mainly on techniques to speed up the search. We examine how best to use the shapelet primitive to construct classifiers. We propose a single-scan shapelet algorithm that finds the best kk shapelets, which are used to produce a transformed dataset, where each of the kk features represent the distance between a time series and a shapelet. The primary advantages over the embedded approach are that the transformed data can be used in conjunction with any classifier, and that there is no recursive search for shapelets. We demonstrate that the transformed data, in conjunction with more complex classifiers, gives greater accuracy than the embedded shapelet tree. We also evaluate three similarity measures that produce equivalent results to information gain in less time. Finally, we show that by conducting post-transform clustering of shapelets, we can enhance the interpretability of the transformed data. We conduct our experiments on 29 datasets: 17 from the UCR repository, and 12 we provide ourselve
    corecore