1,114 research outputs found

    Random projections for high-dimensional curves

    Get PDF
    Modern time series analysis requires the ability to handle datasets that are inherently high-dimensional; examples include applications in climatology, where measurements from numerous sensors must be taken into account, or inventory tracking of large shops, where the dimension is defined by the number of tracked items. The standard way to mitigate computational issues arising from the high-dimensionality of the data is by applying some dimension reduction technique that preserves the structural properties of the ambient space. The dissimilarity between two time series is often measured by ``discrete'' notions of distance, e.g. the dynamic time warping, or the discrete Fr\'echet distance, or simply the Euclidean distance. Since all these distance functions are computed directly on the points of a time series, they are sensitive to different sampling rates or gaps. The continuous Fr\'echet distance offers a popular alternative which aims to alleviate this by taking into account all points on the polygonal curve obtained by linearly interpolating between any two consecutive points in a sequence. We study the ability of random projections \`a la Johnson and Lindenstrauss to preserve the continuous Fr\'echet distance of polygonal curves by effectively reducing the dimension. In particular, we show that one can reduce the dimension to O(ϵ2logN)O(\epsilon^{-2} \log N), where NN is the total number of input points while preserving the continuous Fr\'echet distance between any two determined polygonal curves within a factor of 1±ϵ1\pm \epsilon. We conclude with applications on clustering.Comment: 22 page

    Approximating (k,)(k,\ell)-center clustering for curves

    Get PDF
    The Euclidean kk-center problem is a classical problem that has been extensively studied in computer science. Given a set G\mathcal{G} of nn points in Euclidean space, the problem is to determine a set C\mathcal{C} of kk centers (not necessarily part of G\mathcal{G}) such that the maximum distance between a point in G\mathcal{G} and its nearest neighbor in C\mathcal{C} is minimized. In this paper we study the corresponding (k,)(k,\ell)-center problem for polygonal curves under the Fr\'echet distance, that is, given a set G\mathcal{G} of nn polygonal curves in Rd\mathbb{R}^d, each of complexity mm, determine a set C\mathcal{C} of kk polygonal curves in Rd\mathbb{R}^d, each of complexity \ell, such that the maximum Fr\'echet distance of a curve in G\mathcal{G} to its closest curve in C\mathcal{C} is minimized. In this paper, we substantially extend and improve the known approximation bounds for curves in dimension 22 and higher. We show that, if \ell is part of the input, then there is no polynomial-time approximation scheme unless P=NP\mathsf{P}=\mathsf{NP}. Our constructions yield different bounds for one and two-dimensional curves and the discrete and continuous Fr\'echet distance. In the case of the discrete Fr\'echet distance on two-dimensional curves, we show hardness of approximation within a factor close to 2.5982.598. This result also holds when k=1k=1, and the NP\mathsf{NP}-hardness extends to the case that =\ell=\infty, i.e., for the problem of computing the minimum-enclosing ball under the Fr\'echet distance. Finally, we observe that a careful adaptation of Gonzalez' algorithm in combination with a curve simplification yields a 33-approximation in any dimension, provided that an optimal simplification can be computed exactly. We conclude that our approximation bounds are close to being tight.Comment: 24 pages; results on minimum-enclosing ball added, additional author added, general revisio

    On the Hardness of Computing an Average Curve

    Get PDF
    We study the complexity of clustering curves under kk-median and kk-center objectives in the metric space of the Fr\'echet distance and related distance measures. Building upon recent hardness results for the minimum-enclosing-ball problem under the Fr\'echet distance, we show that also the 11-median problem is NP-hard. Furthermore, we show that the 11-median problem is W[1]-hard with the number of curves as parameter. We show this under the discrete and continuous Fr\'echet and Dynamic Time Warping (DTW) distance. This yields an independent proof of an earlier result by Bulteau et al. from 2018 for a variant of DTW that uses squared distances, where the new proof is both simpler and more general. On the positive side, we give approximation algorithms for problem variants where the center curve may have complexity at most \ell under the discrete Fr\'echet distance. In particular, for fixed k,k,\ell and ε\varepsilon, we give (1+ε)(1+\varepsilon)-approximation algorithms for the (k,)(k,\ell)-median and (k,)(k,\ell)-center objectives and a polynomial-time exact algorithm for the (k,)(k,\ell)-center objective

    Using time-series similarity measures to compare animal movement trajectories in ecology

    Get PDF
    Identifying and understanding patterns in movement data are amongst the principal aims of movement ecology. By quantifying the similarity of movement trajectories, inferences can be made about diverse processes, ranging from individual specialisation to the ontogeny of foraging strategies. Movement analysis is not unique to ecology however, and methods for estimating the similarity of movement trajectories have been developed in other fields but are currently under-utilised by ecologists. Here, we introduce five commonly used measures of trajectory similarity: dynamic time warping (DTW), longest common subsequence (LCSS), edit distance for real sequences (EDR), Fréchet distance and nearest neighbour distance (NND), of which only NND is routinely used by ecologists. We investigate the performance of each of these measures by simulating movement trajectories using an Ornstein-Uhlenbeck (OU) model in which we varied the following parameters: (1) the point of attraction, (2) the strength of attraction to this point and (3) the noise or volatility added to the movement process in order to determine which measures were most responsive to such changes. In addition, we demonstrate how these measures can be applied using movement trajectories of breeding northern gannets (Morus bassanus) by performing trajectory clustering on a large ecological dataset. Simulations showed that DTW and Fréchet distance were most responsive to changes in movement parameters and were able to distinguish between all the different parameter combinations we trialled. In contrast, NND was the least sensitive measure trialled. When applied to our gannet dataset, the five similarity measures were highly correlated despite differences in their underlying calculation. Clustering of trajectories within and across individuals allowed us to easily visualise and compare patterns of space use over time across a large dataset. Trajectory clusters reflected the bearing on which birds departed the colony and highlighted the use of well-known bathymetric features. As both the volume of movement data and the need to quantify similarity amongst animal trajectories grow, the measures described here and the bridge they provide to other fields of research will become increasingly useful in ecology

    Locality-Sensitive Hashing of Curves

    Get PDF
    We study data structures for storing a set of polygonal curves in Rd{\rm R}^d such that, given a query curve, we can efficiently retrieve similar curves from the set, where similarity is measured using the discrete Fr\'echet distance or the dynamic time warping distance. To this end we devise the first locality-sensitive hashing schemes for these distance measures. A major challenge is posed by the fact that these distance measures internally optimize the alignment between the curves. We give solutions for different types of alignments including constrained and unconstrained versions. For unconstrained alignments, we improve over a result by Indyk from 2002 for short curves. Let nn be the number of input curves and let mm be the maximum complexity of a curve in the input. In the particular case where mα4dlognm \leq \frac{\alpha}{4d} \log n, for some fixed α>0\alpha>0, our solutions imply an approximate near-neighbor data structure for the discrete Fr\'echet distance that uses space in O(n1+αlogn)O(n^{1+\alpha}\log n) and achieves query time in O(nαlog2n)O(n^{\alpha}\log^2 n) and constant approximation factor. Furthermore, our solutions provide a trade-off between approximation quality and computational performance: for any parameter k[m]k \in [m], we can give a data structure that uses space in O(22kmk1nlogn+nm)O(2^{2k}m^{k-1} n \log n + nm), answers queries in O(22kmklogn)O( 2^{2k} m^{k}\log n) time and achieves approximation factor in O(m/k)O(m/k).Comment: Proc. of 33rd International Symposium on Computational Geometry (SoCG), 201

    Statistical M-Estimation and Consistency in Large Deformable Models for Image Warping

    Get PDF
    The problem of defining appropriate distances between shapes or images and modeling the variability of natural images by group transformations is at the heart of modern image analysis. A current trend is the study of probabilistic and statistical aspects of deformation models, and the development of consistent statistical procedure for the estimation of template images. In this paper, we consider a set of images randomly warped from a mean template which has to be recovered. For this, we define an appropriate statistical parametric model to generate random diffeomorphic deformations in two-dimensions. Then, we focus on the problem of estimating the mean pattern when the images are observed with noise. This problem is challenging both from a theoretical and a practical point of view. M-estimation theory enables us to build an estimator defined as a minimizer of a well-tailored empirical criterion. We prove the convergence of this estimator and propose a gradient descent algorithm to compute this M-estimator in practice. Simulations of template extraction and an application to image clustering and classification are also provided

    Computing a Subtrajectory Cluster from c-packed Trajectories

    Full text link
    We present a near-linear time approximation algorithm for the subtrajectory cluster problem of cc-packed trajectories. The problem involves finding mm subtrajectories within a given trajectory TT such that their Fr\'echet distances are at most (1+ε)d(1 + \varepsilon)d, and at least one subtrajectory must be of length~ll or longer. A trajectory TT is cc-packed if the intersection of TT and any ball BB with radius rr is at most crc \cdot r in length. Previous results by Gudmundsson and Wong \cite{GudmundssonWong2022Cubicupperlower} established an Ω(n3)\Omega(n^3) lower bound unless the Strong Exponential Time Hypothesis fails, and they presented an O(n3log2n)O(n^3 \log^2 n) time algorithm. We circumvent this conditional lower bound by studying subtrajectory cluster on cc-packed trajectories, resulting in an algorithm with an O((c2n/ε2)log(c/ε)log(n/ε))O((c^2 n/\varepsilon^2)\log(c/\varepsilon)\log(n/\varepsilon)) time complexity
    corecore