4,796 research outputs found
Transformation Based Ensembles for Time Series Classification
Until recently, the vast majority of data mining time series classification (TSC) research has focused on alternative distance measures for 1-Nearest Neighbour (1-NN) classifiers based on either the raw data, or on compressions or smoothing of the raw data. Despite the extensive evidence in favour of 1-NN classifiers with Euclidean or Dynamic Time Warping distance, there has also been a flurry of recent research publications proposing classification algorithms for TSC. Generally, these classifiers describe different ways of incorporating summary measures in the time domain into more complex classifiers. Our hypothesis is that the easiest way to gain improvement on TSC problems is simply to transform into an alternative data space where the discriminatory features are more easily detected. To test our hypothesis, we perform a range of benchmarking experiments in the time domain, before evaluating nearest neighbour classifiers on data transformed into the power spectrum, the autocorrelation function, and the principal component space. We demonstrate that on some problems there is dramatic improvement in the accuracy of classifiers built on the transformed data over classifiers built in the time domain, but that there is also a wide variance in accuracy for a particular classifier built on different data transforms. To overcome this variability, we propose a simple transformation based ensemble, then demonstrate that it improves performance and reduces the variability of classifiers built in the time domain only. Our advice to a practitioner with a real world TSC problem is to try transforms before developing a complex classifier; it is the easiest way to get a potentially large increase in accuracy, and may provide further insights into the underlying relationships that characterise the problem
Efficient Optimization of Performance Measures by Classifier Adaptation
In practical applications, machine learning algorithms are often needed to
learn classifiers that optimize domain specific performance measures.
Previously, the research has focused on learning the needed classifier in
isolation, yet learning nonlinear classifier for nonlinear and nonsmooth
performance measures is still hard. In this paper, rather than learning the
needed classifier by optimizing specific performance measure directly, we
circumvent this problem by proposing a novel two-step approach called as CAPO,
namely to first train nonlinear auxiliary classifiers with existing learning
methods, and then to adapt auxiliary classifiers for specific performance
measures. In the first step, auxiliary classifiers can be obtained efficiently
by taking off-the-shelf learning algorithms. For the second step, we show that
the classifier adaptation problem can be reduced to a quadratic program
problem, which is similar to linear SVMperf and can be efficiently solved. By
exploiting nonlinear auxiliary classifiers, CAPO can generate nonlinear
classifier which optimizes a large variety of performance measures including
all the performance measure based on the contingency table and AUC, whilst
keeping high computational efficiency. Empirical studies show that CAPO is
effective and of high computational efficiency, and even it is more efficient
than linear SVMperf.Comment: 30 pages, 5 figures, to appear in IEEE Transactions on Pattern
Analysis and Machine Intelligence, 201
- …