Search CORE

274 research outputs found

Tight Lower Bounds for Data-Dependent Locality-Sensitive Hashing

Author: Andoni Alexandr
Razenshteyn Ilya
Publication venue
Publication date: 01/01/2015
Field of study

We prove a tight lower bound for the exponent

\rho

for data-dependent Locality-Sensitive Hashing schemes, recently used to design efficient solutions for the

c

-approximate nearest neighbor search. In particular, our lower bound matches the bound of

\rho\le \frac{1}{2c-1}+o(1)

for the

\ell_1

space, obtained via the recent algorithm from [Andoni-Razenshteyn, STOC'15]. In recent years it emerged that data-dependent hashing is strictly superior to the classical Locality-Sensitive Hashing, when the hash function is data-independent. In the latter setting, the best exponent has been already known: for the

\ell_1

space, the tight bound is

\rho=1/c

, with the upper bound from [Indyk-Motwani, STOC'98] and the matching lower bound from [O'Donnell-Wu-Zhou, ITCS'11]. We prove that, even if the hashing is data-dependent, it must hold that

\rho\ge \frac{1}{2c-1}-o(1)

. To prove the result, we need to formalize the exact notion of data-dependent hashing that also captures the complexity of the hash functions (in addition to their collision properties). Without restricting such complexity, we would allow for obviously infeasible solutions such as the Voronoi diagram of a dataset. To preclude such solutions, we require our hash functions to be succinct. This condition is satisfied by all the known algorithmic results.Comment: 16 pages, no figure

arXiv.org e-Print Archive

CiteSeerX

Dagstuhl Research Online Publication Server

Video retrieval based on deep convolutional neural network

Author: Dong Yj
Li JG
Publication venue
Publication date: 30/11/2017
Field of study

Recently, with the enormous growth of online videos, fast video retrieval research has received increasing attention. As an extension of image hashing techniques, traditional video hashing methods mainly depend on hand-crafted features and transform the real-valued features into binary hash codes. As videos provide far more diverse and complex visual information than images, extracting features from videos is much more challenging than that from images. Therefore, high-level semantic features to represent videos are needed rather than low-level hand-crafted methods. In this paper, a deep convolutional neural network is proposed to extract high-level semantic features and a binary hash function is then integrated into this framework to achieve an end-to-end optimization. Particularly, our approach also combines triplet loss function which preserves the relative similarity and difference of videos and classification loss function as the optimization objective. Experiments have been performed on two public datasets and the results demonstrate the superiority of our proposed method compared with other state-of-the-art video retrieval methods

arXiv.org e-Print Archive

Crossref

Fast Locality-Sensitive Hashing Frameworks for Approximate Near Neighbor Search

Author: A Andoni
AL Zobrist
AZ Broder
JL Carter
JL Carter
K Terasawa
M Dubiner
MH Overmars
ML Fredman
N Sundaram
P Li
S Har-Peled
T Hagerup
Publication venue
Publication date: 16/02/2018
Field of study

The Indyk-Motwani Locality-Sensitive Hashing (LSH) framework (STOC 1998) is a general technique for constructing a data structure to answer approximate near neighbor queries by using a distribution

\mathcal{H}

over locality-sensitive hash functions that partition space. For a collection of

n

points, after preprocessing, the query time is dominated by

O(n^{\rho} \log n)

evaluations of hash functions from

\mathcal{H}

and

O(n^{\rho})

hash table lookups and distance computations where

\rho \in (0,1)

is determined by the locality-sensitivity properties of

\mathcal{H}

. It follows from a recent result by Dahlgaard et al. (FOCS 2017) that the number of locality-sensitive hash functions can be reduced to

O(\log^2 n)

, leaving the query time to be dominated by

O(n^{\rho})

distance computations and

O(n^{\rho} \log n)

additional word-RAM operations. We state this result as a general framework and provide a simpler analysis showing that the number of lookups and distance computations closely match the Indyk-Motwani framework, making it a viable replacement in practice. Using ideas from another locality-sensitive hashing framework by Andoni and Indyk (SODA 2006) we are able to reduce the number of additional word-RAM operations to

O(n^\rho)

.Comment: 15 pages, 3 figure

arXiv.org e-Print Archive

Crossref

Research statement: inference of human-computing algorithms from massive-scale educational interventions

Author: López y Rosenfeld Matías
Publication venue
Publication date: 23/10/2014
Field of study

The main goal of the present research statement is to develop an educational computerized framework able to detect in which tasks a child has difficulties and generate a personalized intervention based on automatic observation and evaluation of data, as part of an interdisciplinary project at the crossroads of Computer Science, Cognitive Science, Biology and Psychology. (Párrafo extraído del texto a modo de resumen)Sociedad Argentina de Informática e Investigación Operativa (SADIO

Approximate Nearest Neighbor Search for Low Dimensional Queries

Author: Har-Peled Sariel
Kumar Nirman
Publication venue
Publication date: 01/01/2010
Field of study

We study the Approximate Nearest Neighbor problem for metric spaces where the query points are constrained to lie on a subspace of low doubling dimension, while the data is high-dimensional. We show that this problem can be solved efficiently despite the high dimensionality of the data.Comment: 25 page

arXiv.org e-Print Archive

University of Memphis Digital Commons

CiteSeerX

Crossref

Natural data structure extracted from neighborhood-similarity graphs

Author: Kanders Karlis
Lorimer Tom
Stoop Ruedi
Publication venue: 'Elsevier BV'
Publication date: 15/02/2018
Field of study

'Big' high-dimensional data are commonly analyzed in low-dimensions, after performing a dimensionality-reduction step that inherently distorts the data structure. For the same purpose, clustering methods are also often used. These methods also introduce a bias, either by starting from the assumption of a particular geometric form of the clusters, or by using iterative schemes to enhance cluster contours, with uncontrollable consequences. The goal of data analysis should, however, be to encode and detect structural data features at all scales and densities simultaneously, without assuming a parametric form of data point distances, or modifying them. We propose a novel approach that directly encodes data point neighborhood similarities as a sparse graph. Our non-iterative framework permits a transparent interpretation of data, without altering the original data dimension and metric. Several natural and synthetic data applications demonstrate the efficacy of our novel approach

arXiv.org e-Print Archive

ZORA