714 research outputs found
k-Nearest Neighbour Classifiers: 2nd Edition (with Python examples)
Perhaps the most straightforward classifier in the arsenal or machine
learning techniques is the Nearest Neighbour Classifier -- classification is
achieved by identifying the nearest neighbours to a query example and using
those neighbours to determine the class of the query. This approach to
classification is of particular importance because issues of poor run-time
performance is not such a problem these days with the computational power that
is available. This paper presents an overview of techniques for Nearest
Neighbour classification focusing on; mechanisms for assessing similarity
(distance), computational issues in identifying nearest neighbours and
mechanisms for reducing the dimension of the data.
This paper is the second edition of a paper previously published as a
technical report. Sections on similarity measures for time-series, retrieval
speed-up and intrinsic dimensionality have been added. An Appendix is included
providing access to Python code for the key methods.Comment: 22 pages, 15 figures: An updated edition of an older tutorial on kN
Efficient Kernel-Based Subsequence Search for Enabling Health Monitoring Services in IoT-Based Home Setting
This paper presents an efficient approach for subsequence search in data streams. The problem consists of identifying coherent repetitions of a given reference time-series, also in the multivariate case, within a longer data stream. The most widely adopted metric to address this problem is Dynamic Time Warping (DTW), but its computational complexity is a well-known issue. In this paper, we present an approach aimed at learning a kernel approximating DTW for efficiently analyzing streaming data collected from wearable sensors, while reducing the burden of DTW computation. Contrary to kernel, DTW allows for comparing two time-series with different length. To enable the use of kernel for comparing two time-series with different length, a feature embedding is required in order to obtain a fixed length vector representation. Each vector component is the DTW between the given time-series and a set of "basis" series, randomly chosen. The approach has been validated on two benchmark datasets and on a real-life application for supporting self-rehabilitation in elderly subjects has been addressed. A comparison with traditional DTW implementations and other state-of-the-art algorithms is provided: results show a slight decrease in accuracy, which is counterbalanced by a significant reduction in computational costs
Proximity Forest 2.0: A new effective and scalable similarity-based classifier for time series
Time series classification (TSC) is a challenging task due to the diversity
of types of feature that may be relevant for different classification tasks,
including trends, variance, frequency, magnitude, and various patterns. To
address this challenge, several alternative classes of approach have been
developed, including similarity-based, features and intervals, shapelets,
dictionary, kernel, neural network, and hybrid approaches. While kernel, neural
network, and hybrid approaches perform well overall, some specialized
approaches are better suited for specific tasks. In this paper, we propose a
new similarity-based classifier, Proximity Forest version 2.0 (PF 2.0), which
outperforms previous state-of-the-art similarity-based classifiers across the
UCR benchmark and outperforms state-of-the-art kernel, neural network, and
hybrid methods on specific datasets in the benchmark that are best addressed by
similarity-base methods. PF 2.0 incorporates three recent advances in time
series similarity measures -- (1) computationally efficient early abandoning
and pruning to speedup elastic similarity computations; (2) a new elastic
similarity measure, Amerced Dynamic Time Warping (ADTW); and (3) cost function
tuning. It rationalizes the set of similarity measures employed, reducing the
eight base measures of the original PF to three and using the first derivative
transform with all similarity measures, rather than a limited subset. We have
implemented both PF 1.0 and PF 2.0 in a single C++ framework, making the PF
framework more efficient
Similarity Search for Spatial Trajectories Using Online Lower Bounding DTW and Presorting Strategies
Similarity search with respect to time series has received much attention from research and industry in the last decade. Dynamic time warping is one of the most widely used distance measures in this context. This is due to the simplicity of its definition and the surprising quality of dynamic time warping for time series classification. However, dynamic time warping is not well-behaving with respect to many dimensionality reduction techniques as it does not fulfill the triangle inequality. Additionally, most research on dynamic time warping has been performed with one-dimensional time series or in multivariate cases of varying dimensions. With this paper, we propose three extensions to LB_Rotation for two-dimensional time series (trajectories). We simplify LB_Rotation and adapt it to the online and data streaming case and show how to tune the pruning ratio in similarity search by using presorting strategies based on simple summaries of trajectories. Finally, we provide a thorough valuation of these aspects on a large variety of datasets of spatial trajectories
Times series averaging from a probabilistic interpretation of time-elastic kernel
At the light of regularized dynamic time warping kernels, this paper
reconsider the concept of time elastic centroid (TEC) for a set of time series.
From this perspective, we show first how TEC can easily be addressed as a
preimage problem. Unfortunately this preimage problem is ill-posed, may suffer
from over-fitting especially for long time series and getting a sub-optimal
solution involves heavy computational costs. We then derive two new algorithms
based on a probabilistic interpretation of kernel alignment matrices that
expresses in terms of probabilistic distributions over sets of alignment paths.
The first algorithm is an iterative agglomerative heuristics inspired from the
state of the art DTW barycenter averaging (DBA) algorithm proposed specifically
for the Dynamic Time Warping measure. The second proposed algorithm achieves a
classical averaging of the aligned samples but also implements an averaging of
the time of occurrences of the aligned samples. It exploits a straightforward
progressive agglomerative heuristics. An experimentation that compares for 45
time series datasets classification error rates obtained by first near
neighbors classifiers exploiting a single medoid or centroid estimate to
represent each categories show that: i) centroids based approaches
significantly outperform medoids based approaches, ii) on the considered
experience, the two proposed algorithms outperform the state of the art DBA
algorithm, and iii) the second proposed algorithm that implements an averaging
jointly in the sample space and along the time axes emerges as the most
significantly robust time elastic averaging heuristic with an interesting noise
reduction capability. Index Terms-Time series averaging Time elastic kernel
Dynamic Time Warping Time series clustering and classification
Parameterizing the cost function of Dynamic Time Warping with application to time series classification
Dynamic Time Warping (DTW) is a popular time series distance measure that
aligns the points in two series with one another. These alignments support
warping of the time dimension to allow for processes that unfold at differing
rates. The distance is the minimum sum of costs of the resulting alignments
over any allowable warping of the time dimension. The cost of an alignment of
two points is a function of the difference in the values of those points. The
original cost function was the absolute value of this difference. Other cost
functions have been proposed. A popular alternative is the square of the
difference. However, to our knowledge, this is the first investigation of both
the relative impacts of using different cost functions and the potential to
tune cost functions to different tasks. We do so in this paper by using a
tunable cost function {\lambda}{\gamma} with parameter {\gamma}. We show that
higher values of {\gamma} place greater weight on larger pairwise differences,
while lower values place greater weight on smaller pairwise differences. We
demonstrate that training {\gamma} significantly improves the accuracy of both
the DTW nearest neighbor and Proximity Forest classifiers
- …