1,598 research outputs found
Maximum Inner-Product Search using Tree Data-structures
The problem of {\em efficiently} finding the best match for a query in a
given set with respect to the Euclidean distance or the cosine similarity has
been extensively studied in literature. However, a closely related problem of
efficiently finding the best match with respect to the inner product has never
been explored in the general setting to the best of our knowledge. In this
paper we consider this general problem and contrast it with the existing
best-match algorithms. First, we propose a general branch-and-bound algorithm
using a tree data structure. Subsequently, we present a dual-tree algorithm for
the case where there are multiple queries. Finally we present a new data
structure for increasing the efficiency of the dual-tree algorithm. These
branch-and-bound algorithms involve novel bounds suited for the purpose of
best-matching with inner products. We evaluate our proposed algorithms on a
variety of data sets from various applications, and exhibit up to five orders
of magnitude improvement in query time over the naive search technique.Comment: Under submission in KDD 201
A new Lenstra-type Algorithm for Quasiconvex Polynomial Integer Minimization with Complexity 2^O(n log n)
We study the integer minimization of a quasiconvex polynomial with
quasiconvex polynomial constraints. We propose a new algorithm that is an
improvement upon the best known algorithm due to Heinz (Journal of Complexity,
2005). This improvement is achieved by applying a new modern Lenstra-type
algorithm, finding optimal ellipsoid roundings, and considering sparse
encodings of polynomials. For the bounded case, our algorithm attains a
time-complexity of s (r l M d)^{O(1)} 2^{2n log_2(n) + O(n)} when M is a bound
on the number of monomials in each polynomial and r is the binary encoding
length of a bound on the feasible region. In the general case, s l^{O(1)}
d^{O(n)} 2^{2n log_2(n) +O(n)}. In each we assume d>= 2 is a bound on the total
degree of the polynomials and l bounds the maximum binary encoding size of the
input.Comment: 28 pages, 10 figure
GOGMA: Globally-Optimal Gaussian Mixture Alignment
Gaussian mixture alignment is a family of approaches that are frequently used
for robustly solving the point-set registration problem. However, since they
use local optimisation, they are susceptible to local minima and can only
guarantee local optimality. Consequently, their accuracy is strongly dependent
on the quality of the initialisation. This paper presents the first
globally-optimal solution to the 3D rigid Gaussian mixture alignment problem
under the L2 distance between mixtures. The algorithm, named GOGMA, employs a
branch-and-bound approach to search the space of 3D rigid motions SE(3),
guaranteeing global optimality regardless of the initialisation. The geometry
of SE(3) was used to find novel upper and lower bounds for the objective
function and local optimisation was integrated into the scheme to accelerate
convergence without voiding the optimality guarantee. The evaluation
empirically supported the optimality proof and showed that the method performed
much more robustly on two challenging datasets than an existing
globally-optimal registration solution.Comment: Manuscript in press 2016 IEEE Conference on Computer Vision and
Pattern Recognitio
Optimal Data-Dependent Hashing for Approximate Near Neighbors
We show an optimal data-dependent hashing scheme for the approximate near
neighbor problem. For an -point data set in a -dimensional space our data
structure achieves query time and space , where for the Euclidean space and
approximation . For the Hamming space, we obtain an exponent of
.
Our result completes the direction set forth in [AINR14] who gave a
proof-of-concept that data-dependent hashing can outperform classical Locality
Sensitive Hashing (LSH). In contrast to [AINR14], the new bound is not only
optimal, but in fact improves over the best (optimal) LSH data structures
[IM98,AI06] for all approximation factors .
From the technical perspective, we proceed by decomposing an arbitrary
dataset into several subsets that are, in a certain sense, pseudo-random.Comment: 36 pages, 5 figures, an extended abstract appeared in the proceedings
of the 47th ACM Symposium on Theory of Computing (STOC 2015
- …