research

Optimal Las Vegas Locality Sensitive Data Structures

Abstract

We show that approximate similarity (near neighbour) search can be solved in high dimensions with performance matching state of the art (data independent) Locality Sensitive Hashing, but with a guarantee of no false negatives. Specifically, we give two data structures for common problems. For cc-approximate near neighbour in Hamming space we get query time dn1/c+o(1)dn^{1/c+o(1)} and space dn1+1/c+o(1)dn^{1+1/c+o(1)} matching that of \cite{indyk1998approximate} and answering a long standing open question from~\cite{indyk2000dimensionality} and~\cite{pagh2016locality} in the affirmative. By means of a new deterministic reduction from 1\ell_1 to Hamming we also solve 1\ell_1 and 2\ell_2 with query time d2n1/c+o(1)d^2n^{1/c+o(1)} and space d2n1+1/c+o(1)d^2 n^{1+1/c+o(1)}. For (s1,s2)(s_1,s_2)-approximate Jaccard similarity we get query time dnρ+o(1)dn^{\rho+o(1)} and space dn1+ρ+o(1)dn^{1+\rho+o(1)}, ρ=log1+s12s1/log1+s22s2\rho=\log\frac{1+s_1}{2s_1}\big/\log\frac{1+s_2}{2s_2}, when sets have equal size, matching the performance of~\cite{tobias2016}. The algorithms are based on space partitions, as with classic LSH, but we construct these using a combination of brute force, tensoring, perfect hashing and splitter functions \`a la~\cite{naor1995splitters}. We also show a new dimensionality reduction lemma with 1-sided error

    Similar works

    Full text

    thumbnail-image

    Available Versions