3 research outputs found

    Real Time Pattern Matching with Dynamic Normalization

    Full text link
    Pattern matching in time series data streams is considered to be an essential data mining problem that still stays challenging for many practical scenarios. Different factors such as noise, varying amplitude scale or shift, signal stretches or shrinks in time are all leading to performance degradation of many existing pattern matching algorithms. In this paper, we introduce a dynamic z-normalization mechanism allowing for proper signal scaling even under significant time and amplitude distortions. Based on that, we further propose a Dynamic Time Warping-based real-time pattern matching method to recover hidden patterns that can be distorted in both time and amplitude. We evaluate our proposed method on synthetic and real-world scenarios under realistic conditions demonstrating its high operational characteristics comparing to other state-of-the-art pattern matching methods

    Tight lower bounds for Dynamic Time Warping

    Full text link
    Dynamic Time Warping (DTW) is a popular similarity measure for aligning and comparing time series. Due to DTW's high computation time, lower bounds are often employed to screen poor matches. Many alternative lower bounds have been proposed, providing a range of different trade-offs between tightness and computational efficiency. LB Keogh provides a useful trade-off in many applications. Two recent lower bounds, LB Improved and LB Enhanced, are substantially tighter than LB Keogh. All three have the same worst case computational complexity - linear with respect to series length and constant with respect to window size. We present four new DTW lower bounds in the same complexity class. LB Petitjean is substantially tighter than LB Improved, with only modest additional computational overhead. LB Webb is more efficient than LB Improved, while often providing a tighter bound. LB Webb is always tighter than LB Keogh. The parameter free LB Webb is usually tighter than LB Enhanced. A parameterized variant, LB Webb Enhanced, is always tighter than LB Enhanced. A further variant, LB Webb*, is useful for some constrained distance functions. In extensive experiments, LB Webb proves to be very effective for nearest neighbor search.Comment: 26 pages, 23 figures, expanded version of a paper accepted for publication in Pattern Recognition. This revision fixed minor typos in the two algorithm

    Effective Algorithms for the Closest Pair and Related Problems

    Get PDF
    The Closest Pair problem aims to identify the closest pair (using some similarity measure, e.g., Euclidean distance, Dynamic Time Warping distance, etc.) of points in a metric space. This is one of the fundamental problems that has a wide range of applications in the data mining area, since most of the data can be represented in a vector form residing in a high dimensional space, and we would like to identify the relationship among those data points. Typical applications include but not limited to, social data analysis, user pattern identification, motif mining in biological data, data clustering, etc. This is a very classical problem and has been studied very well in the past decades. In this thesis, we study the Closest Pair problem and its variants, and also bring the machine learning perspective to solve some closely related problems. In particular, we have proposed two approximate algorithms to efficiently address the Closest Pair of Points (CPP) problem, and one deterministic approach to solve the Closest Pair of Subsequences (CPS) problem, using Euclidean distance measure. In addition, to identify the closest subsequences in the time series data, we have proposed a learnable feature extractor embedded in an artificial neural network, to learn patterns in the scope of the Dynamic Time Warping metric. In the end, to speed up the inference speed of the proposed algorithm, we have also proposed a neural network pruning technique to obtain a smaller network with similar capacity. All the proposed methods are shown to have achieved the state-of-the-art performance in various standard benchmark datasets
    corecore