Search CORE

3,832 research outputs found

KV-match: A Subsequence Matching Approach Supporting Normalization and Time Warping [Extended Version]

Author: Pan Ningting
Wang Chen
Wang Jianmin
Wang Peng
Wang Wei
Wu Jiaye
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 09/09/2018
Field of study

The volume of time series data has exploded due to the popularity of new applications, such as data center management and IoT. Subsequence matching is a fundamental task in mining time series data. All index-based approaches only consider raw subsequence matching (RSM) and do not support subsequence normalization. UCR Suite can deal with normalized subsequence match problem (NSM), but it needs to scan full time series. In this paper, we propose a novel problem, named constrained normalized subsequence matching problem (cNSM), which adds some constraints to NSM problem. The cNSM problem provides a knob to flexibly control the degree of offset shifting and amplitude scaling, which enables users to build the index to process the query. We propose a new index structure, KV-index, and the matching algorithm, KV-match. With a single index, our approach can support both RSM and cNSM problems under either ED or DTW distance. KV-index is a key-value structure, which can be easily implemented on local files or HBase tables. To support the query of arbitrary lengths, we extend KV-match to KV-match

_{DP}

, which utilizes multiple varied-length indexes to process the query. We conduct extensive experiments on synthetic and real-world datasets. The results verify the effectiveness and efficiency of our approach.Comment: 13 page

arXiv.org e-Print Archive

Crossref

Approximating Dynamic Time Warping and Edit Distance for a Pair of Point Sequences

Author: Agarwal Pankaj K.
Fox Kyle
Pan Jiangwei
Ying Rex
Publication venue
Publication date: 01/01/2016
Field of study

We give the first subquadratic-time approximation schemes for dynamic time warping (DTW) and edit distance (ED) of several natural families of point sequences in

\mathbb{R}^d

, for any fixed

d \ge 1

. In particular, our algorithms compute

(1+\varepsilon)

-approximations of DTW and ED in time near-linear for point sequences drawn from k-packed or k-bounded curves, and subquadratic for backbone sequences. Roughly speaking, a curve is

\kappa

-packed if the length of its intersection with any ball of radius

r

is at most

\kappa \cdot r

, and a curve is

\kappa

-bounded if the sub-curve between two curve points does not go too far from the two points compared to the distance between the two points. In backbone sequences, consecutive points are spaced at approximately equal distances apart, and no two points lie very close together. Recent results suggest that a subquadratic algorithm for DTW or ED is unlikely for an arbitrary pair of point sequences even for

d=1

. Our algorithms work by constructing a small set of rectangular regions that cover the entries of the dynamic programming table commonly used for these distance measures. The weights of entries inside each rectangle are roughly the same, so we are able to use efficient procedures to approximately compute the cheapest paths through these rectangles

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server

Products of Euclidean Metrics, Applied to Proximity Problems among Curves

Author: Emiris Ioannis Z.
Psarros Ioannis
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 21/08/2020
Field of study

International audienceApproximate Nearest Neighbor (ANN) search is a fundamental computational problem that has benefited from significant progress in the past couple of decades. However, most work has been devoted to pointsets, whereas complex shapes have not been sufficiently addressed. Here, we focus on distance functions between discretized curves in Euclidean space: They appear in a wide range of applications, from road segments and molecular backbones to time-series in general dimension. For ℓp-products of Euclidean metrics, for any constant p, we propose simple and efficient data structures for ANN based on randomized projections: These data structures are of independent interest. Furthermore, they serve to solve proximity questions under a notion of distance between discretized curves, which generalizes both discrete Fréchet and Dynamic Time Warping distance functions. These are two very popular and practical approaches to comparing such curves. We offer, for both approaches, the first data structures and query algorithms for ANN with arbitrarily good approximation factor, at the expense of increasing space usage and preprocessing time over existing methods. Query time complexity is comparable or significantly improved by our methods; our algorithm is especially efficient when the length of the curves is bounded. Finally, we focus on discrete Fréchet distance when the ambient space is high dimensional and derive complexity bounds in terms of doubling dimension as well as an improved approximate near neighbor search

INRIA a CCSD electronic archive server