3,832 research outputs found
KV-match: A Subsequence Matching Approach Supporting Normalization and Time Warping [Extended Version]
The volume of time series data has exploded due to the popularity of new
applications, such as data center management and IoT. Subsequence matching is a
fundamental task in mining time series data. All index-based approaches only
consider raw subsequence matching (RSM) and do not support subsequence
normalization. UCR Suite can deal with normalized subsequence match problem
(NSM), but it needs to scan full time series. In this paper, we propose a novel
problem, named constrained normalized subsequence matching problem (cNSM),
which adds some constraints to NSM problem. The cNSM problem provides a knob to
flexibly control the degree of offset shifting and amplitude scaling, which
enables users to build the index to process the query. We propose a new index
structure, KV-index, and the matching algorithm, KV-match. With a single index,
our approach can support both RSM and cNSM problems under either ED or DTW
distance. KV-index is a key-value structure, which can be easily implemented on
local files or HBase tables. To support the query of arbitrary lengths, we
extend KV-match to KV-match, which utilizes multiple varied-length
indexes to process the query. We conduct extensive experiments on synthetic and
real-world datasets. The results verify the effectiveness and efficiency of our
approach.Comment: 13 page
Approximating Dynamic Time Warping and Edit Distance for a Pair of Point Sequences
We give the first subquadratic-time approximation schemes for dynamic time
warping (DTW) and edit distance (ED) of several natural families of point
sequences in , for any fixed . In particular, our
algorithms compute -approximations of DTW and ED in time
near-linear for point sequences drawn from k-packed or k-bounded curves, and
subquadratic for backbone sequences. Roughly speaking, a curve is
-packed if the length of its intersection with any ball of radius
is at most , and a curve is -bounded if the sub-curve
between two curve points does not go too far from the two points compared to
the distance between the two points. In backbone sequences, consecutive points
are spaced at approximately equal distances apart, and no two points lie very
close together. Recent results suggest that a subquadratic algorithm for DTW or
ED is unlikely for an arbitrary pair of point sequences even for . Our
algorithms work by constructing a small set of rectangular regions that cover
the entries of the dynamic programming table commonly used for these distance
measures. The weights of entries inside each rectangle are roughly the same, so
we are able to use efficient procedures to approximately compute the cheapest
paths through these rectangles
Products of Euclidean Metrics, Applied to Proximity Problems among Curves
International audienceApproximate Nearest Neighbor (ANN) search is a fundamental computational problem that has benefited from significant progress in the past couple of decades. However, most work has been devoted to pointsets, whereas complex shapes have not been sufficiently addressed. Here, we focus on distance functions between discretized curves in Euclidean space: They appear in a wide range of applications, from road segments and molecular backbones to time-series in general dimension. For ℓp-products of Euclidean metrics, for any constant p, we propose simple and efficient data structures for ANN based on randomized projections: These data structures are of independent interest. Furthermore, they serve to solve proximity questions under a notion of distance between discretized curves, which generalizes both discrete Fréchet and Dynamic Time Warping distance functions. These are two very popular and practical approaches to comparing such curves. We offer, for both approaches, the first data structures and query algorithms for ANN with arbitrarily good approximation factor, at the expense of increasing space usage and preprocessing time over existing methods. Query time complexity is comparable or significantly improved by our methods; our algorithm is especially efficient when the length of the curves is bounded. Finally, we focus on discrete Fréchet distance when the ambient space is high dimensional and derive complexity bounds in terms of doubling dimension as well as an improved approximate near neighbor search
- …