95,042 research outputs found
Fast Exact Search in Hamming Space with Multi-Index Hashing
There is growing interest in representing image data and feature descriptors
using compact binary codes for fast near neighbor search. Although binary codes
are motivated by their use as direct indices (addresses) into a hash table,
codes longer than 32 bits are not being used as such, as it was thought to be
ineffective. We introduce a rigorous way to build multiple hash tables on
binary code substrings that enables exact k-nearest neighbor search in Hamming
space. The approach is storage efficient and straightforward to implement.
Theoretical analysis shows that the algorithm exhibits sub-linear run-time
behavior for uniformly distributed codes. Empirical results show dramatic
speedups over a linear scan baseline for datasets of up to one billion codes of
64, 128, or 256 bits
Faster tuple lattice sieving using spherical locality-sensitive filters
To overcome the large memory requirement of classical lattice sieving
algorithms for solving hard lattice problems, Bai-Laarhoven-Stehl\'{e} [ANTS
2016] studied tuple lattice sieving, where tuples instead of pairs of lattice
vectors are combined to form shorter vectors. Herold-Kirshanova [PKC 2017]
recently improved upon their results for arbitrary tuple sizes, for example
showing that a triple sieve can solve the shortest vector problem (SVP) in
dimension in time , using a technique similar to
locality-sensitive hashing for finding nearest neighbors.
In this work, we generalize the spherical locality-sensitive filters of
Becker-Ducas-Gama-Laarhoven [SODA 2016] to obtain space-time tradeoffs for near
neighbor searching on dense data sets, and we apply these techniques to tuple
lattice sieving to obtain even better time complexities. For instance, our
triple sieve heuristically solves SVP in time . For
practical sieves based on Micciancio-Voulgaris' GaussSieve [SODA 2010], this
shows that a triple sieve uses less space and less time than the current best
near-linear space double sieve.Comment: 12 pages + references, 2 figures. Subsumed/merged into Cryptology
ePrint Archive 2017/228, available at https://ia.cr/2017/122
Fast Deterministic Selection
The Median of Medians (also known as BFPRT) algorithm, although a landmark
theoretical achievement, is seldom used in practice because it and its variants
are slower than simple approaches based on sampling. The main contribution of
this paper is a fast linear-time deterministic selection algorithm
QuickselectAdaptive based on a refined definition of MedianOfMedians. The
algorithm's performance brings deterministic selection---along with its
desirable properties of reproducible runs, predictable run times, and immunity
to pathological inputs---in the range of practicality. We demonstrate results
on independent and identically distributed random inputs and on
normally-distributed inputs. Measurements show that QuickselectAdaptive is
faster than state-of-the-art baselines.Comment: Pre-publication draf
Fast optimization algorithms and the cosmological constant
Denef and Douglas have observed that in certain landscape models the problem
of finding small values of the cosmological constant is a large instance of an
NP-hard problem. The number of elementary operations (quantum gates) needed to
solve this problem by brute force search exceeds the estimated computational
capacity of the observable universe. Here we describe a way out of this
puzzling circumstance: despite being NP-hard, the problem of finding a small
cosmological constant can be attacked by more sophisticated algorithms whose
performance vastly exceeds brute force search. In fact, in some parameter
regimes the average-case complexity is polynomial. We demonstrate this by
explicitly finding a cosmological constant of order in a randomly
generated -dimensional ADK landscape.Comment: 19 pages, 5 figure
Memory vectors for similarity search in high-dimensional spaces
We study an indexing architecture to store and search in a database of
high-dimensional vectors from the perspective of statistical signal processing
and decision theory. This architecture is composed of several memory units,
each of which summarizes a fraction of the database by a single representative
vector. The potential similarity of the query to one of the vectors stored in
the memory unit is gauged by a simple correlation with the memory unit's
representative vector. This representative optimizes the test of the following
hypothesis: the query is independent from any vector in the memory unit vs. the
query is a simple perturbation of one of the stored vectors.
Compared to exhaustive search, our approach finds the most similar database
vectors significantly faster without a noticeable reduction in search quality.
Interestingly, the reduction of complexity is provably better in
high-dimensional spaces. We empirically demonstrate its practical interest in a
large-scale image search scenario with off-the-shelf state-of-the-art
descriptors.Comment: Accepted to IEEE Transactions on Big Dat
- âŠ