111,851 research outputs found
Tradeoffs for nearest neighbors on the sphere
We consider tradeoffs between the query and update complexities for the
(approximate) nearest neighbor problem on the sphere, extending the recent
spherical filters to sparse regimes and generalizing the scheme and analysis to
account for different tradeoffs. In a nutshell, for the sparse regime the
tradeoff between the query complexity and update complexity
for data sets of size is given by the following equation in
terms of the approximation factor and the exponents and :
For small , minimizing the time for updates leads to a linear
space complexity at the cost of a query time complexity .
Balancing the query and update costs leads to optimal complexities
, matching bounds from [Andoni-Razenshteyn, 2015] and [Dubiner,
IEEE-TIT'10] and matching the asymptotic complexities of [Andoni-Razenshteyn,
STOC'15] and [Andoni-Indyk-Laarhoven-Razenshteyn-Schmidt, NIPS'15]. A
subpolynomial query time complexity can be achieved at the cost of a
space complexity of the order , matching the bound
of [Andoni-Indyk-Patrascu, FOCS'06] and
[Panigrahy-Talwar-Wieder, FOCS'10] and improving upon results of
[Indyk-Motwani, STOC'98] and [Kushilevitz-Ostrovsky-Rabani, STOC'98].
For large , minimizing the update complexity results in a query complexity
of , improving upon the related exponent for large of
[Kapralov, PODS'15] by a factor , and matching the bound
of [Panigrahy-Talwar-Wieder, FOCS'08]. Balancing the costs leads to optimal
complexities , while a minimum query time complexity can be
achieved with update complexity , improving upon the
previous best exponents of Kapralov by a factor .Comment: 16 pages, 1 table, 2 figures. Mostly subsumed by arXiv:1608.03580
[cs.DS] (along with arXiv:1605.02701 [cs.DS]
Hybrid LSH: Faster Near Neighbors Reporting in High-dimensional Space
We study the -near neighbors reporting problem (-NN), i.e., reporting
\emph{all} points in a high-dimensional point set that lie within a radius
of a given query point . Our approach builds upon on the
locality-sensitive hashing (LSH) framework due to its appealing asymptotic
sublinear query time for near neighbor search problems in high-dimensional
space. A bottleneck of the traditional LSH scheme for solving -NN is that
its performance is sensitive to data and query-dependent parameters. On
datasets whose data distributions have diverse local density patterns, LSH with
inappropriate tuning parameters can sometimes be outperformed by a simple
linear search.
In this paper, we introduce a hybrid search strategy between LSH-based search
and linear search for -NN in high-dimensional space. By integrating an
auxiliary data structure into LSH hash tables, we can efficiently estimate the
computational cost of LSH-based search for a given query regardless of the data
distribution. This means that we are able to choose the appropriate search
strategy between LSH-based search and linear search to achieve better
performance. Moreover, the integrated data structure is time efficient and fits
well with many recent state-of-the-art LSH-based approaches. Our experiments on
real-world datasets show that the hybrid search approach outperforms (or is
comparable to) both LSH-based search and linear search for a wide range of
search radii and data distributions in high-dimensional space.Comment: Accepted as a short paper in EDBT 201
Planar Visibility: Testing and Counting
In this paper we consider query versions of visibility testing and visibility
counting. Let be a set of disjoint line segments in and let
be an element of . Visibility testing is to preprocess so that we can
quickly determine if is visible from a query point . Visibility counting
involves preprocessing so that one can quickly estimate the number of
segments in visible from a query point .
We present several data structures for the two query problems. The structures
build upon a result by O'Rourke and Suri (1984) who showed that the subset,
, of that is weakly visible from a segment can be
represented as the union of a set, , of triangles, even though
the complexity of can be . We define a variant of their
covering, give efficient output-sensitive algorithms for computing it, and
prove additional properties needed to obtain approximation bounds. Some of our
bounds rely on a new combinatorial result that relates the number of segments
of visible from a point to the number of triangles in that contain .Comment: 22 page
Towards trajectory anonymization: a generalization-based approach
Trajectory datasets are becoming popular due to the massive usage of GPS and locationbased services. In this paper, we address privacy issues regarding the identification of individuals in static trajectory datasets. We first adopt the notion of k-anonymity to trajectories and propose a novel generalization-based approach for anonymization of trajectories. We further show that releasing
anonymized trajectories may still have some privacy leaks. Therefore we propose a randomization based reconstruction algorithm for releasing anonymized trajectory data and also present how the underlying techniques can be adapted to other anonymity standards. The experimental results on real and synthetic trajectory datasets show the effectiveness of the proposed techniques
Multimapper: Data Density Sensitive Topological Visualization
Mapper is an algorithm that summarizes the topological information contained
in a dataset and provides an insightful visualization. It takes as input a
point cloud which is possibly high-dimensional, a filter function on it and an
open cover on the range of the function. It returns the nerve simplicial
complex of the pullback of the cover. Mapper can be considered a discrete
approximation of the topological construct called Reeb space, as analysed in
the -dimensional case by [Carriere et al.,2018]. Despite its success in
obtaining insights in various fields such as in [Kamruzzaman et al., 2016],
Mapper is an ad hoc technique requiring lots of parameter tuning. There is also
no measure to quantify goodness of the resulting visualization, which often
deviates from the Reeb space in practice. In this paper, we introduce a new
cover selection scheme for data that reduces the obscuration of topological
information at both the computation and visualisation steps. To achieve this,
we replace global scale selection of cover with a scale selection scheme
sensitive to local density of data points. We also propose a method to detect
some deviations in Mapper from Reeb space via computation of persistence
features on the Mapper graph.Comment: Accepted at ICDM
- …