Search CORE

34 research outputs found

Locality-Sensitive Hashing of Curves

Author: Driemel Anne
Silvestri Francesco
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 33rd International Symposium on Computational Geometry (SoCG 2017)
Publication date: 01/01/2017
Field of study

We study data structures for storing a set of polygonal curves in

{\rm R}^d

such that, given a query curve, we can efficiently retrieve similar curves from the set, where similarity is measured using the discrete Fr\'echet distance or the dynamic time warping distance. To this end we devise the first locality-sensitive hashing schemes for these distance measures. A major challenge is posed by the fact that these distance measures internally optimize the alignment between the curves. We give solutions for different types of alignments including constrained and unconstrained versions. For unconstrained alignments, we improve over a result by Indyk from 2002 for short curves. Let

n

be the number of input curves and let

m

be the maximum complexity of a curve in the input. In the particular case where

m \leq \frac{\alpha}{4d} \log n

, for some fixed

\alpha>0

, our solutions imply an approximate near-neighbor data structure for the discrete Fr\'echet distance that uses space in

O(n^{1+\alpha}\log n)

and achieves query time in

O(n^{\alpha}\log^2 n)

and constant approximation factor. Furthermore, our solutions provide a trade-off between approximation quality and computational performance: for any parameter

k \in [m]

, we can give a data structure that uses space in

O(2^{2k}m^{k-1} n \log n + nm)

, answers queries in

O( 2^{2k} m^{k}\log n)

time and achieves approximation factor in

O(m/k)

.Comment: Proc. of 33rd International Symposium on Computational Geometry (SoCG), 201

arXiv.org e-Print Archive

Repository TU/e

Pure OAI Repository

Dagstuhl Research Online Publication Server

Archivio istituzionale della ricerca - Università di Padova

Adaptive MapReduce Similarity Joins

Author: McCauley Samuel
Silvestri Francesco
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2018
Field of study

Similarity joins are a fundamental database operation. Given data sets S and R, the goal of a similarity join is to find all points x in S and y in R with distance at most r. Recent research has investigated how locality-sensitive hashing (LSH) can be used for similarity join, and in particular two recent lines of work have made exciting progress on LSH-based join performance. Hu, Tao, and Yi (PODS 17) investigated joins in a massively parallel setting, showing strong results that adapt to the size of the output. Meanwhile, Ahle, Aum\"uller, and Pagh (SODA 17) showed a sequential algorithm that adapts to the structure of the data, matching classic bounds in the worst case but improving them significantly on more structured data. We show that this adaptive strategy can be adapted to the parallel setting, combining the advantages of these approaches. In particular, we show that a simple modification to Hu et al.'s algorithm achieves bounds that depend on the density of points in the dataset as well as the total outsize of the output. Our algorithm uses no extra parameters over other LSH approaches (in particular, its execution does not depend on the structure of the dataset), and is likely to be efficient in practice

arXiv.org e-Print Archive

Crossref

The IT University of Copenhagen's Repository

Archivio istituzionale della ricerca - Università di Padova

On the complexity of range searching among curves

Author: Afshani Peyman
Driemel Anne
Publication venue
Publication date: 15/07/2017
Field of study

Modern tracking technology has made the collection of large numbers of densely sampled trajectories of moving objects widely available. We consider a fundamental problem encountered when analysing such data: Given

n

polygonal curves

S

\mathbb{R}^d

, preprocess

S

into a data structure that answers queries with a query curve

q

and radius

\rho

for the curves of

S

that have \Frechet distance at most

\rho

q

. We initiate a comprehensive analysis of the space/query-time trade-off for this data structuring problem. Our lower bounds imply that any data structure in the pointer model model that achieves

Q(n) + O(k)

query time, where

k

is the output size, has to use roughly

\Omega\left((n/Q(n))^2\right)

space in the worst case, even if queries are mere points (for the discrete \Frechet distance) or line segments (for the continuous \Frechet distance). More importantly, we show that more complex queries and input curves lead to additional logarithmic factors in the lower bound. Roughly speaking, the number of logarithmic factors added is linear in the number of edges added to the query and input curve complexity. This means that the space/query time trade-off worsens by an exponential factor of input and query complexity. This behaviour addresses an open question in the range searching literature: whether it is possible to avoid the additional logarithmic factors in the space and query time of a multilevel partition tree. We answer this question negatively. On the positive side, we show we can build data structures for the \Frechet distance by using semialgebraic range searching. Our solution for the discrete \Frechet distance is in line with the lower bound, as the number of levels in the data structure is

O(t)

, where

t

denotes the maximal number of vertices of a curve. For the continuous \Frechet distance, the number of levels increases to

O(t^2)

arXiv.org e-Print Archive

Crossref

Repository TU/e

Pure OAI Repository

Products of Euclidean Metrics and Applications to Proximity Questions among Curves

Author: Emiris Ioannis Z.
Psarros Ioannis
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 34th International Symposium on Computational Geometry (SoCG 2018)
Publication date: 01/01/2018
Field of study

International audienceThe problem of Approximate Nearest Neighbor (ANN) search is fundamental in computer science and has benefited from significant progress in the past couple of decades. However, most work has been devoted to pointsets whereas complex shapes have not been sufficiently treated. Here, we focus on distance functions between discretized curves in Euclidean space: they appear in a wide range of applications, from road segments and molecular backbones to time-series in general dimension. For p-products of Euclidean metrics, for any p ≥ 1, we design simple and efficient data structures for ANN, based on randomized projections, which are of independent interest. They serve to solve proximity problems under a notion of distance between discretized curves, which generalizes both discrete Fréchet and Dynamic Time Warping distances. These are the most popular and practical approaches to comparing such curves. We offer the first data structures and query algorithms for ANN with arbitrarily good approximation factor, at the expense of increasing space usage and preprocessing time over existing methods. Query time complexity is comparable or significantly improved by our algorithms; our approach is especially efficient when the length of the curves is bounded. 2012 ACM Subject Classification Theory of computation → Data structures design and analysi

INRIA a CCSD electronic archive server

Dagstuhl Research Online Publication Server

Trajectory similarity learning with auxiliary supervision and optimal matching

Author: JIANG Qize
SUN Weiwei
SUN Zhenbang
WANG Changhu
ZHANG Hanyuan
ZHANG Xingyu
ZHENG Baihua
Publication venue
Publication date: 01/07/2020
Field of study

tru

Crossref

Institutional Knowledge at Singapore Management University

FRESH: Fréchet similarity with hashing

Author: A Driemel
H Alt
J Gudmundsson
K Bringmann
K Rong
M Berg de
M Dietzfelbinger
M Konzack
Martin Werner
N Sundaram
P Agarwal
Peyman Afshani
S Shang
T Wylie
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2019
Field of study

This paper studies the r-range search problem for curves under the continuous Fréchet distance: given a dataset S of n polygonal curves and a threshold >0 , construct a data structure that, for any query curve q, efficiently returns all entries in S with distance at most r from q. We propose FRESH, an approximate and randomized approach for r-range search, that leverages on a locality sensitive hashing scheme for detecting candidate near neighbors of the query curve, and on a subsequent pruning step based on a cascade of curve simplifications. We experimentally compare FRESH to exact and deterministic solutions, and we show that high performance can be reached by suitably relaxing precision and recall

Crossref

The IT University of Copenhagen's Repository

Archivio istituzionale della ricerca - Università di Padova