274 research outputs found
Tight Lower Bounds for Data-Dependent Locality-Sensitive Hashing
We prove a tight lower bound for the exponent for data-dependent
Locality-Sensitive Hashing schemes, recently used to design efficient solutions
for the -approximate nearest neighbor search. In particular, our lower bound
matches the bound of for the space,
obtained via the recent algorithm from [Andoni-Razenshteyn, STOC'15].
In recent years it emerged that data-dependent hashing is strictly superior
to the classical Locality-Sensitive Hashing, when the hash function is
data-independent. In the latter setting, the best exponent has been already
known: for the space, the tight bound is , with the upper
bound from [Indyk-Motwani, STOC'98] and the matching lower bound from
[O'Donnell-Wu-Zhou, ITCS'11].
We prove that, even if the hashing is data-dependent, it must hold that
. To prove the result, we need to formalize the
exact notion of data-dependent hashing that also captures the complexity of the
hash functions (in addition to their collision properties). Without restricting
such complexity, we would allow for obviously infeasible solutions such as the
Voronoi diagram of a dataset. To preclude such solutions, we require our hash
functions to be succinct. This condition is satisfied by all the known
algorithmic results.Comment: 16 pages, no figure
Video retrieval based on deep convolutional neural network
Recently, with the enormous growth of online videos, fast video retrieval
research has received increasing attention. As an extension of image hashing
techniques, traditional video hashing methods mainly depend on hand-crafted
features and transform the real-valued features into binary hash codes. As
videos provide far more diverse and complex visual information than images,
extracting features from videos is much more challenging than that from images.
Therefore, high-level semantic features to represent videos are needed rather
than low-level hand-crafted methods. In this paper, a deep convolutional neural
network is proposed to extract high-level semantic features and a binary hash
function is then integrated into this framework to achieve an end-to-end
optimization. Particularly, our approach also combines triplet loss function
which preserves the relative similarity and difference of videos and
classification loss function as the optimization objective. Experiments have
been performed on two public datasets and the results demonstrate the
superiority of our proposed method compared with other state-of-the-art video
retrieval methods
Fast Locality-Sensitive Hashing Frameworks for Approximate Near Neighbor Search
The Indyk-Motwani Locality-Sensitive Hashing (LSH) framework (STOC 1998) is a
general technique for constructing a data structure to answer approximate near
neighbor queries by using a distribution over locality-sensitive
hash functions that partition space. For a collection of points, after
preprocessing, the query time is dominated by evaluations
of hash functions from and hash table lookups and
distance computations where is determined by the
locality-sensitivity properties of . It follows from a recent
result by Dahlgaard et al. (FOCS 2017) that the number of locality-sensitive
hash functions can be reduced to , leaving the query time to be
dominated by distance computations and
additional word-RAM operations. We state this result as a general framework and
provide a simpler analysis showing that the number of lookups and distance
computations closely match the Indyk-Motwani framework, making it a viable
replacement in practice. Using ideas from another locality-sensitive hashing
framework by Andoni and Indyk (SODA 2006) we are able to reduce the number of
additional word-RAM operations to .Comment: 15 pages, 3 figure
Research statement: inference of human-computing algorithms from massive-scale educational interventions
The main goal of the present research statement is to develop an educational computerized framework able to detect in which tasks a child has difficulties and generate a personalized intervention based on automatic observation and evaluation of data, as part of an interdisciplinary project at the crossroads of Computer Science, Cognitive Science, Biology and Psychology.
(Párrafo extraído del texto a modo de resumen)Sociedad Argentina de Informática e Investigación Operativa (SADIO
Approximate Nearest Neighbor Search for Low Dimensional Queries
We study the Approximate Nearest Neighbor problem for metric spaces where the
query points are constrained to lie on a subspace of low doubling dimension,
while the data is high-dimensional. We show that this problem can be solved
efficiently despite the high dimensionality of the data.Comment: 25 page
Natural data structure extracted from neighborhood-similarity graphs
'Big' high-dimensional data are commonly analyzed in low-dimensions, after
performing a dimensionality-reduction step that inherently distorts the data
structure. For the same purpose, clustering methods are also often used. These
methods also introduce a bias, either by starting from the assumption of a
particular geometric form of the clusters, or by using iterative schemes to
enhance cluster contours, with uncontrollable consequences. The goal of data
analysis should, however, be to encode and detect structural data features at
all scales and densities simultaneously, without assuming a parametric form of
data point distances, or modifying them. We propose a novel approach that
directly encodes data point neighborhood similarities as a sparse graph. Our
non-iterative framework permits a transparent interpretation of data, without
altering the original data dimension and metric. Several natural and synthetic
data applications demonstrate the efficacy of our novel approach
- …