429 research outputs found

    The Contract Random Interval Spectral Ensemble (c-RISE): The Effect of Contracting a Classifier on Accuracy

    Get PDF
    The Random Interval Spectral Ensemble (RISE) is a recently introduced tree based time series classification algorithm, in which each tree is built on a distinct set of Fourier, autocorrelation and partial autocorrelation features. It is a component in the meta ensemble HIVE-COTE [9]. RISE has run time complexity of O(nm2)O(nm2), where m is the series length and n the number of train cases. This is prohibitively slow when considering long series, which are common in problems such as audio classification, where spectral approaches are likely to perform better than classifiers built in the time domain. We propose an enhancement of RISE that allows the user to specify how long the algorithm can have to run. The contract RISE (c-RISE) allows for check-pointing and adaptively estimates the time taken to build each tree in the ensemble through learning the constant terms in the run time complexity function. We show how the dynamic approach to contracting is more effective than the static approach of estimating the complexity before executing, and investigate the effect of contracting on accuracy for a range of large problems

    A tale of two toolkits, report the third: on the usage and performance of HIVE-COTE v1.0

    Full text link
    The Hierarchical Vote Collective of Transformation-based Ensembles (HIVE-COTE) is a heterogeneous meta ensemble for time series classification. Since it was first proposed in 2016, the algorithm has undergone some minor changes and there is now a configurable, scalable and easy to use version available in two open source repositories. We present an overview of the latest stable HIVE-COTE, version 1.0, and describe how it differs to the original. We provide a walkthrough guide of how to use the classifier, and conduct extensive experimental evaluation of its predictive performance and resource usage. We compare the performance of HIVE-COTE to three recently proposed algorithms

    Classifying dangerous species of mosquito using machine learning

    Get PDF
    This thesis begins by presenting the performance of modern Time Series Classification (TSC) approaches, including HIVE-COTEv2 & InceptionTime, on 4 new insect wingbeat datasets. The experiments throughout this thesis endeavour to explore whether it is possible to classify flying insects into their respective species and into group based on their sex. Furthermore, it is hypothesised that a hierarchical approach to classifying flying insects is possible via filtering “easy” cases using cheap to obtain features, reducing the number of times processing intensive approaches are utilised. Experiments are undertaken on 3 representations of the data: Harmonic Spectral Product (HSP), the raw data and spectral data. HSP is a method of extracting the fundamental frequency of a signal. It represents a logical benchmark for comparison and, is easy and quick to extract. In one dataset, InsectSounds, species are separated into sex. Evaluation of the results achieved with the HSP representation showed that despite a relatively poor overall accuracy this feature produces a low type II error with respect to female mosquitoes. It is shown that classes of mosquitoes species that are female were more likely to be miss-classified as other female mosquito classes and, where fly classes are miss-classified as mosquito classes, they are typically classified as male mosquitoes. Previous work had shown that transformation into the frequency domain has a positive effect on performance. Audio data is typically recorded at a high sample rate, which results in high spectral resolution. As a result, approaches from the literature have used truncation of high and low frequency data to reduce runtime. It is hypothesised that inclusion of low frequency data will aid classification. This is because low frequency data is likely caused by the body of the mosquito and morphological differences, such as size, are strongly correlated to sex. The results show that the performance of all approaches was improved by the use of spectral data. The results also showed that spectral data that included low frequency information resulted in a higher overall accuracy than transformations that discarded it. Formative experiments showed that HIVE-COTEv1 was the most accurate approach at classifying flying insects. HIVE-COTEv1 is a heterogeneous approach that consists of 4 modules, Random Interval Spectral Ensemble (RISE), Bag Of SFA Symbols (BOSS), Shapelet Transform Classifier (STC) and Time Series Forest (TSF). The predictive power of these modules are combined via Cross-validation Accuracy Weighted Probabilistic Ensemble (CAWPE). The RISE approach was chosen as the spectral component as it was “best in class” at the inception of HIVE-COTEv1. It is suggested that a significant improvement to the usability and accuracy of RISE, would translate as an improvement in the performance of HIVE-COTEv1. The introduction of contracting provided a method through witch the training time of RISE could be effectively controlled, improving its usability. A review of the interval selection procedure led to improvements that had a significant positive effect on accuracy. A review of spectral transforms and the method of combining them led to a further improvement to accuracy, and an architecture in which multiple transformations are applied. In order for smart traps to be effective they are required to work for extended periods in rural locations. Implementations of hierarchical approaches show that two expert features, HSP and time of flight (TOF) are effective in reducing test time and therefore the amount of processing required. This is achieved via first classifying the test case using simple approaches, such as BayesNet, and only if the confidence in the prediction does not meet a parameterised threshold using a more powerful approach. In an evaluation of several methods of combination, the most efficient of these is shown to increase classification accuracy by 0.6%, increase the TPR of female mosquitoes by 48/10,000, decrease the FNR of female mosquitoes by 83/15,000 and reduce test time by 1.5 hours over 25,000 instances, when compared to the single best approach InceptionTime. Furthermore, a cumulative approach to combining the expert features with the InceptionTime approach resulted in a 4.14% increase in accuracy, an increase in the TPR of female mosquitoes of 139/10,000 and a decrease in the FNR of female mosquitoes of 45/15,000

    HIVE-COTE 2.0: a new meta ensemble for time series classification

    Get PDF
    The Hierarchical Vote Collective of Transformation-based Ensembles (HIVE-COTE) is a heterogeneous meta ensemble for time series classification. HIVE-COTE forms its ensemble from classifiers of multiple domains, including phase-independent shapelets, bag-of-words based dictionaries and phase-dependent intervals. Since it was first proposed in 2016, the algorithm has remained state of the art for accuracy on the UCR time series classification archive. Over time it has been incrementally updated, culminating in its current state, HIVE-COTE 1.0. During this time a number of algorithms have been proposed which match the accuracy of HIVE-COTE. We propose comprehensive changes to the HIVE-COTE algorithm which significantly improve its accuracy and usability, presenting this upgrade as HIVE-COTE 2.0. We introduce two novel classifiers, the Temporal Dictionary Ensemble and Diverse Representation Canonical Interval Forest, which replace existing ensemble members. Additionally, we introduce the Arsenal, an ensemble of ROCKET classifiers as a new HIVE-COTE 2.0 constituent. We demonstrate that HIVE-COTE 2.0 is significantly more accurate on average than the current state of the art on 112 univariate UCR archive datasets and 26 multivariate UEA archive datasets

    QUANT: A Minimalist Interval Method for Time Series Classification

    Full text link
    We show that it is possible to achieve the same accuracy, on average, as the most accurate existing interval methods for time series classification on a standard set of benchmark datasets using a single type of feature (quantiles), fixed intervals, and an 'off the shelf' classifier. This distillation of interval-based approaches represents a fast and accurate method for time series classification, achieving state-of-the-art accuracy on the expanded set of 142 datasets in the UCR archive with a total compute time (training and inference) of less than 15 minutes using a single CPU core.Comment: 26 pages, 20 figure

    The Canonical Interval Forest {(CIF)} Classifier for Time Series Classification

    Get PDF
    Time series classification (TSC) is home to a number of algorithm groups that utilise different kinds of discriminatory patterns. One of these groups describes classifiers that predict using phase dependant intervals. The time series forest (TSF) classifier is one of the most well known interval methods, and has demonstrated strong performance as well as relative speed in training and predictions. However, recent advances in other approaches have left TSF behind. TSF originally summarises intervals using three simple summary statistics. The `catch22' feature set of 22 time series features was recently proposed to aid time series analysis through a concise set of diverse and informative descriptive characteristics. We propose combining TSF and catch22 to form a new classifier, the Canonical Interval Forest (CIF). We outline additional enhancements to the training procedure, and extend the classifier to include multivariate classification capabilities. We demonstrate a large and significant improvement in accuracy over both TSF and catch22, and show it to be on par with top performers from other algorithmic classes. By upgrading the interval-based component from TSF to CIF, we also demonstrate a significant improvement in the hierarchical vote collective of transformation-based ensembles (HIVE-COTE) that combines different time series representations. HIVE-COTE using CIF is significantly more accurate on the UCR archive than any other classifier we are aware of and represents a new state of the art for TSC

    Complexity Measures and Features for Times Series classification

    Get PDF
    Classification of time series is a growing problem in different disciplines due to the progressive digitalization of the world. Currently, the state-of-the-art in time series classification is dominated by The Hierarchical Vote Collective of Transformation-based Ensembles. This algorithm is composed of several classifiers of different domains distributed in five large modules. The combination of the results obtained by each module weighed based on an internal evaluation process allows this algorithm to obtain the best results in state-of-the-art. One Nearest Neighbour with Dynamic Time Warping remains the base classifier in any time series classification problem for its simplicity and good results. Despite their performance, they share a weakness, which is that they are not interpretable. In the field of time series classification, there is a tradeoff between accuracy and interpretability. In this work, we propose a set of characteristics capable of extracting information on the structure of the time series to face time series classification problems. The use of these characteristics allows the use of traditional classification algorithms in time series problems. The experimental results of our proposal show no statistically significant differences from the second and third best models of the state-of-the-art. Apart from competitive results in accuracy, our proposal is able to offer interpretable results based on the set of characteristics proposed.Spanish Government TIN2016-81113-R PID2020-118224RB-I00 BES-2017-080137Andalusian Regional Government, Spain P12-TIC-2958 P18-TP-5168 A-TIC-388-UGR-1

    Ensembles for multivariate time series classification

    Get PDF
    Time Series Classification (TSC) involves learning predictive models for a discrete target variable from ordered, real-valued, attributes. Over recent years, a new set of TSC algorithms have been developed which have significantly improved the previous state of the art. The main focus has been on univariate TSC, i.e. the problem where each case has a single series and a class label. In reality, it is more common to encounter multivariate TSC (MTSC) problems where multiple series are associated with a single label. Despite this, much less consideration has been given to MTSC than the univariate case. Therefore, this work focuses on MTSC from different perspectives. First, by introducing a set of 33 problems for MTSC in different areas called the UEA MTSC archive. Second, by introducing the state-of-the-art algorithms and comparing on those problems. That experimentation concluded that HIVE-COTE2 (HC2) is the current state of the art. Third, because of that, the remainder of this work focused on two ways to improve HC2: a) By improving one of the components (Shapelet Transform Classifier) and b) by Adding a preprocessing phase for dimension selection in order improve HC2 by removing the dimensions that do not contribute. In the first case, we were able to improve HC2 significantly for MTSC problems, and in the second case, there was no significant improvement in accuracy. Still, there were gains in decreasing the number of dimensions required and hence the run time

    Heterogeneous ensembles and time series classification techniques for the non-invasive authentication of spirits

    Get PDF
    Spirits are a prime target for fraudulent activity. Particular brands, production processes, and other factors such as age can carry high value, and leave space for mimicry. Further, the improper production of spirits, either maliciously or through negligence, can result in harmful substances being sold for consumption. Lastly, genuine spirits producers themselves must ensure the quality and standardisation of their products before sale. Authenticating spirits can be a time consuming and destructive process, requiring sealed bottles to be opened for access to the product. It is therefore desirable to have a fast, non-invasive means of indicating the authenticity, safety, and correctness of spirits. We advance and prototype such a system based on near infrared spectroscopy, and generate datasets for the detection of correct alcohol concentrations in synthesised spirits, for the presence of methanol in genuine spirits, and for the distinction of particular genuine products in a given bottle. The standard chemometric pipelines for the analysis of spectra involve smoothing of the signal, standardising for global intensity, possible dimensionality reduction, and some form of least squares regression. This has decades of proof behind it, and works under the assumptions of clean signal gathering, potentially the separation of sample and particular substance of interest, and the generally linear relationship of light received/blocked and the analyte’s contents. In the proposed system, at least one of these assumptions must be violated. We therefore investigate the use of modern classification techniques to overcome these challenges. In particular, we investigate and develop ensemble methods and time series classification algorithms. Our first hypothesis is that algorithms which consider the ordered nature of the wavelength features, as opposed to treating the spectra effectively as tabular data, can better handle the structural changes brought about by different bottle and environmental characteristics. The second is that ensembling heterogeneous classifiers is the best initial technique for a new data science problem, but should in particular be helpful for the spirit authentication problem, where different classifiers may be able to correct for different defects in the data. In initial investigations on datasets of synthesised alcohol solutions and different products, we prove the feasibility of the authentication system to make at least indicative predictions of authenticity, but find that it lacks the precision and accuracy needed for anything more than indicative results. Following this, we propose a novel heterogeneous ensembling scheme, CAWPE, and perform a large scale evaluation on public archives to prove its efficacy. We then outline improvements in the time series classification space that lead to the state of the art meta-ensemble HIVECOTE 2.0, which makes use of CAWPE. We lastly apply the developed techniques to a final dataset on methanol concentration detection. We find that the proposed system can classify methanol concentration in arbitrary spirits and bottles from ten possible values, containing as little as 0.25%, to an accuracy of 0.921. We further conclude that while heterogeneously ensembling tabular classifiers does improve the authentication of spirits from spectra, time series classification methods confer no particular advantage beyond tabular methods
    • …
    corecore