1,114 research outputs found
Random projections for high-dimensional curves
Modern time series analysis requires the ability to handle datasets that are
inherently high-dimensional; examples include applications in climatology,
where measurements from numerous sensors must be taken into account, or
inventory tracking of large shops, where the dimension is defined by the number
of tracked items. The standard way to mitigate computational issues arising
from the high-dimensionality of the data is by applying some dimension
reduction technique that preserves the structural properties of the ambient
space. The dissimilarity between two time series is often measured by
``discrete'' notions of distance, e.g. the dynamic time warping, or the
discrete Fr\'echet distance, or simply the Euclidean distance. Since all these
distance functions are computed directly on the points of a time series, they
are sensitive to different sampling rates or gaps. The continuous Fr\'echet
distance offers a popular alternative which aims to alleviate this by taking
into account all points on the polygonal curve obtained by linearly
interpolating between any two consecutive points in a sequence.
We study the ability of random projections \`a la Johnson and Lindenstrauss
to preserve the continuous Fr\'echet distance of polygonal curves by
effectively reducing the dimension. In particular, we show that one can reduce
the dimension to , where is the total number of
input points while preserving the continuous Fr\'echet distance between any two
determined polygonal curves within a factor of . We conclude
with applications on clustering.Comment: 22 page
Approximating -center clustering for curves
The Euclidean -center problem is a classical problem that has been
extensively studied in computer science. Given a set of
points in Euclidean space, the problem is to determine a set of
centers (not necessarily part of ) such that the maximum
distance between a point in and its nearest neighbor in
is minimized. In this paper we study the corresponding
-center problem for polygonal curves under the Fr\'echet distance,
that is, given a set of polygonal curves in ,
each of complexity , determine a set of polygonal curves
in , each of complexity , such that the maximum Fr\'echet
distance of a curve in to its closest curve in is
minimized. In this paper, we substantially extend and improve the known
approximation bounds for curves in dimension and higher. We show that, if
is part of the input, then there is no polynomial-time approximation
scheme unless . Our constructions yield different
bounds for one and two-dimensional curves and the discrete and continuous
Fr\'echet distance. In the case of the discrete Fr\'echet distance on
two-dimensional curves, we show hardness of approximation within a factor close
to . This result also holds when , and the -hardness
extends to the case that , i.e., for the problem of computing the
minimum-enclosing ball under the Fr\'echet distance. Finally, we observe that a
careful adaptation of Gonzalez' algorithm in combination with a curve
simplification yields a -approximation in any dimension, provided that an
optimal simplification can be computed exactly. We conclude that our
approximation bounds are close to being tight.Comment: 24 pages; results on minimum-enclosing ball added, additional author
added, general revisio
On the Hardness of Computing an Average Curve
We study the complexity of clustering curves under -median and -center
objectives in the metric space of the Fr\'echet distance and related distance
measures. Building upon recent hardness results for the minimum-enclosing-ball
problem under the Fr\'echet distance, we show that also the -median problem
is NP-hard. Furthermore, we show that the -median problem is W[1]-hard with
the number of curves as parameter. We show this under the discrete and
continuous Fr\'echet and Dynamic Time Warping (DTW) distance. This yields an
independent proof of an earlier result by Bulteau et al. from 2018 for a
variant of DTW that uses squared distances, where the new proof is both simpler
and more general. On the positive side, we give approximation algorithms for
problem variants where the center curve may have complexity at most
under the discrete Fr\'echet distance. In particular, for fixed and
, we give -approximation algorithms for the
-median and -center objectives and a polynomial-time exact
algorithm for the -center objective
Using time-series similarity measures to compare animal movement trajectories in ecology
Identifying and understanding patterns in movement data are amongst the principal aims of movement ecology. By quantifying the similarity of movement trajectories, inferences can be made about diverse processes, ranging from individual specialisation to the ontogeny of foraging strategies. Movement analysis is not unique to ecology however, and methods for estimating the similarity of movement trajectories have been developed in other fields but are currently under-utilised by ecologists. Here, we introduce five commonly used measures of trajectory similarity: dynamic time warping (DTW), longest common subsequence (LCSS), edit distance for real sequences (EDR), Fréchet distance and nearest neighbour distance (NND), of which only NND is routinely used by ecologists. We investigate the performance of each of these measures by simulating movement trajectories using an Ornstein-Uhlenbeck (OU) model in which we varied the following parameters: (1) the point of attraction, (2) the strength of attraction to this point and (3) the noise or volatility added to the movement process in order to determine which measures were most responsive to such changes. In addition, we demonstrate how these measures can be applied using movement trajectories of breeding northern gannets (Morus bassanus) by performing trajectory clustering on a large ecological dataset. Simulations showed that DTW and Fréchet distance were most responsive to changes in movement parameters and were able to distinguish between all the different parameter combinations we trialled. In contrast, NND was the least sensitive measure trialled. When applied to our gannet dataset, the five similarity measures were highly correlated despite differences in their underlying calculation. Clustering of trajectories within and across individuals allowed us to easily visualise and compare patterns of space use over time across a large dataset. Trajectory clusters reflected the bearing on which birds departed the colony and highlighted the use of well-known bathymetric features. As both the volume of movement data and the need to quantify similarity amongst animal trajectories grow, the measures described here and the bridge they provide to other fields of research will become increasingly useful in ecology
Locality-Sensitive Hashing of Curves
We study data structures for storing a set of polygonal curves in
such that, given a query curve, we can efficiently retrieve similar curves from
the set, where similarity is measured using the discrete Fr\'echet distance or
the dynamic time warping distance. To this end we devise the first
locality-sensitive hashing schemes for these distance measures. A major
challenge is posed by the fact that these distance measures internally optimize
the alignment between the curves. We give solutions for different types of
alignments including constrained and unconstrained versions. For unconstrained
alignments, we improve over a result by Indyk from 2002 for short curves. Let
be the number of input curves and let be the maximum complexity of a
curve in the input. In the particular case where , for some fixed , our solutions imply an approximate near-neighbor
data structure for the discrete Fr\'echet distance that uses space in
and achieves query time in and
constant approximation factor. Furthermore, our solutions provide a trade-off
between approximation quality and computational performance: for any parameter
, we can give a data structure that uses space in , answers queries in time and achieves
approximation factor in .Comment: Proc. of 33rd International Symposium on Computational Geometry
(SoCG), 201
Statistical M-Estimation and Consistency in Large Deformable Models for Image Warping
The problem of defining appropriate distances between shapes or images and modeling the variability of natural images by group transformations is at the heart of modern image analysis. A current trend is the study of probabilistic and statistical aspects of deformation models, and the development of consistent statistical procedure for the estimation of template images. In this paper, we consider a set of images randomly warped from a mean template which has to be recovered. For this, we define an appropriate statistical parametric model to generate random diffeomorphic deformations in two-dimensions. Then, we focus on the problem of estimating the mean pattern when the images are observed with noise. This problem is challenging both from a theoretical and a practical point of view. M-estimation theory enables us to build an estimator defined as a minimizer of a well-tailored empirical criterion. We prove the convergence of this estimator and propose a gradient descent algorithm to compute this M-estimator in practice. Simulations of template extraction and an application to image clustering and classification are also provided
Computing a Subtrajectory Cluster from c-packed Trajectories
We present a near-linear time approximation algorithm for the subtrajectory
cluster problem of -packed trajectories. The problem involves finding
subtrajectories within a given trajectory such that their Fr\'echet
distances are at most , and at least one subtrajectory must
be of length~ or longer. A trajectory is -packed if the intersection
of and any ball with radius is at most in length.
Previous results by Gudmundsson and Wong
\cite{GudmundssonWong2022Cubicupperlower} established an lower
bound unless the Strong Exponential Time Hypothesis fails, and they presented
an time algorithm. We circumvent this conditional lower bound
by studying subtrajectory cluster on -packed trajectories, resulting in an
algorithm with an time complexity
- …