3 research outputs found
Real Time Pattern Matching with Dynamic Normalization
Pattern matching in time series data streams is considered to be an essential
data mining problem that still stays challenging for many practical scenarios.
Different factors such as noise, varying amplitude scale or shift, signal
stretches or shrinks in time are all leading to performance degradation of many
existing pattern matching algorithms. In this paper, we introduce a dynamic
z-normalization mechanism allowing for proper signal scaling even under
significant time and amplitude distortions. Based on that, we further propose a
Dynamic Time Warping-based real-time pattern matching method to recover hidden
patterns that can be distorted in both time and amplitude. We evaluate our
proposed method on synthetic and real-world scenarios under realistic
conditions demonstrating its high operational characteristics comparing to
other state-of-the-art pattern matching methods
Tight lower bounds for Dynamic Time Warping
Dynamic Time Warping (DTW) is a popular similarity measure for aligning and
comparing time series. Due to DTW's high computation time, lower bounds are
often employed to screen poor matches. Many alternative lower bounds have been
proposed, providing a range of different trade-offs between tightness and
computational efficiency. LB Keogh provides a useful trade-off in many
applications. Two recent lower bounds, LB Improved and LB Enhanced, are
substantially tighter than LB Keogh. All three have the same worst case
computational complexity - linear with respect to series length and constant
with respect to window size. We present four new DTW lower bounds in the same
complexity class. LB Petitjean is substantially tighter than LB Improved, with
only modest additional computational overhead. LB Webb is more efficient than
LB Improved, while often providing a tighter bound. LB Webb is always tighter
than LB Keogh. The parameter free LB Webb is usually tighter than LB Enhanced.
A parameterized variant, LB Webb Enhanced, is always tighter than LB Enhanced.
A further variant, LB Webb*, is useful for some constrained distance functions.
In extensive experiments, LB Webb proves to be very effective for nearest
neighbor search.Comment: 26 pages, 23 figures, expanded version of a paper accepted for
publication in Pattern Recognition. This revision fixed minor typos in the
two algorithm
Effective Algorithms for the Closest Pair and Related Problems
The Closest Pair problem aims to identify the closest pair (using some similarity measure, e.g., Euclidean distance, Dynamic Time Warping distance, etc.) of points in a metric space. This is one of the fundamental problems that has a wide range of applications in the data mining area, since most of the data can be represented in a vector form residing in a high dimensional space, and we would like to identify the relationship among those data points. Typical applications include but not limited to, social data analysis, user pattern identification, motif mining in biological data, data clustering, etc. This is a very classical problem and has been studied very well in the past decades.
In this thesis, we study the Closest Pair problem and its variants, and also bring the machine learning perspective to solve some closely related problems. In particular, we have proposed two approximate algorithms to efficiently address the Closest Pair of Points (CPP) problem, and one deterministic approach to solve the Closest Pair of Subsequences (CPS) problem, using Euclidean distance measure. In addition, to identify the closest subsequences in the time series data, we have proposed a learnable feature extractor embedded in an artificial neural network, to learn patterns in the scope of the Dynamic Time Warping metric. In the end, to speed up the inference speed of the proposed algorithm, we have also proposed a neural network pruning technique to obtain a smaller network with similar capacity.
All the proposed methods are shown to have achieved the state-of-the-art performance in various standard benchmark datasets