Search CORE

17,982 research outputs found

Maximum Inner-Product Search using Tree Data-structures

Author: Gray Alexander G.
Ram Parikshit
Publication venue
Publication date: 01/01/2012
Field of study

The problem of {\em efficiently} finding the best match for a query in a given set with respect to the Euclidean distance or the cosine similarity has been extensively studied in literature. However, a closely related problem of efficiently finding the best match with respect to the inner product has never been explored in the general setting to the best of our knowledge. In this paper we consider this general problem and contrast it with the existing best-match algorithms. First, we propose a general branch-and-bound algorithm using a tree data structure. Subsequently, we present a dual-tree algorithm for the case where there are multiple queries. Finally we present a new data structure for increasing the efficiency of the dual-tree algorithm. These branch-and-bound algorithms involve novel bounds suited for the purpose of best-matching with inner products. We evaluate our proposed algorithms on a variety of data sets from various applications, and exhibit up to five orders of magnitude improvement in query time over the naive search technique.Comment: Under submission in KDD 201

arXiv.org e-Print Archive

CiteSeerX

Tradeoffs for nearest neighbors on the sphere

Author: Laarhoven Thijs
Publication venue
Publication date: 01/01/2015
Field of study

We consider tradeoffs between the query and update complexities for the (approximate) nearest neighbor problem on the sphere, extending the recent spherical filters to sparse regimes and generalizing the scheme and analysis to account for different tradeoffs. In a nutshell, for the sparse regime the tradeoff between the query complexity

n^{\rho_q}

and update complexity

n^{\rho_u}

for data sets of size

n

is given by the following equation in terms of the approximation factor

c

and the exponents

\rho_q

and

\rho_u

c^2\sqrt{\rho_q}+(c^2-1)\sqrt{\rho_u}=\sqrt{2c^2-1}.

For small

c=1+\epsilon

, minimizing the time for updates leads to a linear space complexity at the cost of a query time complexity

n^{1-4\epsilon^2}

. Balancing the query and update costs leads to optimal complexities

n^{1/(2c^2-1)}

, matching bounds from [Andoni-Razenshteyn, 2015] and [Dubiner, IEEE-TIT'10] and matching the asymptotic complexities of [Andoni-Razenshteyn, STOC'15] and [Andoni-Indyk-Laarhoven-Razenshteyn-Schmidt, NIPS'15]. A subpolynomial query time complexity

n^{o(1)}

can be achieved at the cost of a space complexity of the order

n^{1/(4\epsilon^2)}

, matching the bound

n^{\Omega(1/\epsilon^2)}

of [Andoni-Indyk-Patrascu, FOCS'06] and [Panigrahy-Talwar-Wieder, FOCS'10] and improving upon results of [Indyk-Motwani, STOC'98] and [Kushilevitz-Ostrovsky-Rabani, STOC'98]. For large

c

, minimizing the update complexity results in a query complexity of

n^{2/c^2+O(1/c^4)}

, improving upon the related exponent for large

c

of [Kapralov, PODS'15] by a factor

2

, and matching the bound

n^{\Omega(1/c^2)}

of [Panigrahy-Talwar-Wieder, FOCS'08]. Balancing the costs leads to optimal complexities

n^{1/(2c^2-1)}

, while a minimum query time complexity can be achieved with update complexity

n^{2/c^2+O(1/c^4)}

, improving upon the previous best exponents of Kapralov by a factor

2

.Comment: 16 pages, 1 table, 2 figures. Mostly subsumed by arXiv:1608.03580 [cs.DS] (along with arXiv:1605.02701 [cs.DS]

arXiv.org e-Print Archive

Repository TU/e

Pure OAI Repository

Scalable Image Retrieval by Sparse Product Quantization

Author: Chen Chun
Hoi Steven C. H.
Ning Qingqun
Zhong Zhiyuan
Zhu Jianke
Publication venue
Publication date: 15/03/2016
Field of study

Fast Approximate Nearest Neighbor (ANN) search technique for high-dimensional feature indexing and retrieval is the crux of large-scale image retrieval. A recent promising technique is Product Quantization, which attempts to index high-dimensional image features by decomposing the feature space into a Cartesian product of low dimensional subspaces and quantizing each of them separately. Despite the promising results reported, their quantization approach follows the typical hard assignment of traditional quantization methods, which may result in large quantization errors and thus inferior search performance. Unlike the existing approaches, in this paper, we propose a novel approach called Sparse Product Quantization (SPQ) to encoding the high-dimensional feature vectors into sparse representation. We optimize the sparse representations of the feature vectors by minimizing their quantization errors, making the resulting representation is essentially close to the original data in practice. Experiments show that the proposed SPQ technique is not only able to compress data, but also an effective encoding technique. We obtain state-of-the-art results for ANN search on four public image datasets and the promising results of content-based image retrieval further validate the efficacy of our proposed method.Comment: 12 page

arXiv.org e-Print Archive

Institutional Knowledge at Singapore Management University

An Efficient Index for Visual Search in Appearance-based SLAM

Author: Hajebi Kiana
Zhang Hong
Publication venue
Publication date: 27/09/2013
Field of study

Vector-quantization can be a computationally expensive step in visual bag-of-words (BoW) search when the vocabulary is large. A BoW-based appearance SLAM needs to tackle this problem for an efficient real-time operation. We propose an effective method to speed up the vector-quantization process in BoW-based visual SLAM. We employ a graph-based nearest neighbor search (GNNS) algorithm to this aim, and experimentally show that it can outperform the state-of-the-art. The graph-based search structure used in GNNS can efficiently be integrated into the BoW model and the SLAM framework. The graph-based index, which is a k-NN graph, is built over the vocabulary words and can be extracted from the BoW's vocabulary construction procedure, by adding one iteration to the k-means clustering, which adds small extra cost. Moreover, exploiting the fact that images acquired for appearance-based SLAM are sequential, GNNS search can be initiated judiciously which helps increase the speedup of the quantization process considerably

arXiv.org e-Print Archive

CiteSeerX

Crossref

Faster tuple lattice sieving using spherical locality-sensitive filters

Author: Laarhoven Thijs
Publication venue
Publication date: 08/05/2017
Field of study

To overcome the large memory requirement of classical lattice sieving algorithms for solving hard lattice problems, Bai-Laarhoven-Stehl\'{e} [ANTS 2016] studied tuple lattice sieving, where tuples instead of pairs of lattice vectors are combined to form shorter vectors. Herold-Kirshanova [PKC 2017] recently improved upon their results for arbitrary tuple sizes, for example showing that a triple sieve can solve the shortest vector problem (SVP) in dimension

d

in time

2^{0.3717d + o(d)}

, using a technique similar to locality-sensitive hashing for finding nearest neighbors. In this work, we generalize the spherical locality-sensitive filters of Becker-Ducas-Gama-Laarhoven [SODA 2016] to obtain space-time tradeoffs for near neighbor searching on dense data sets, and we apply these techniques to tuple lattice sieving to obtain even better time complexities. For instance, our triple sieve heuristically solves SVP in time

2^{0.3588d + o(d)}

. For practical sieves based on Micciancio-Voulgaris' GaussSieve [SODA 2010], this shows that a triple sieve uses less space and less time than the current best near-linear space double sieve.Comment: 12 pages + references, 2 figures. Subsumed/merged into Cryptology ePrint Archive 2017/228, available at https://ia.cr/2017/122

arXiv.org e-Print Archive

Pure OAI Repository

Reverse k Nearest Neighbor Search over Trajectories

Author: Bao Zhifeng
Cong Gao
Culpepper J. Shane
Sellis Timos
Wang Sheng
Publication venue
Publication date: 01/01/2017
Field of study

GPS enables mobile devices to continuously provide new opportunities to improve our daily lives. For example, the data collected in applications created by Uber or Public Transport Authorities can be used to plan transportation routes, estimate capacities, and proactively identify low coverage areas. In this paper, we study a new kind of query-Reverse k Nearest Neighbor Search over Trajectories (RkNNT), which can be used for route planning and capacity estimation. Given a set of existing routes DR, a set of passenger transitions DT, and a query route Q, a RkNNT query returns all transitions that take Q as one of its k nearest travel routes. To solve the problem, we first develop an index to handle dynamic trajectory updates, so that the most up-to-date transition data are available for answering a RkNNT query. Then we introduce a filter refinement framework for processing RkNNT queries using the proposed indexes. Next, we show how to use RkNNT to solve the optimal route planning problem MaxRkNNT (MinRkNNT), which is to search for the optimal route from a start location to an end location that could attract the maximum (or minimum) number of passengers based on a pre-defined travel distance threshold. Experiments on real datasets demonstrate the efficiency and scalability of our approaches. To the best of our best knowledge, this is the first work to study the RkNNT problem for route planning.Comment: 12 page

arXiv.org e-Print Archive

RMIT Research Repository

DR-NTU (Digital Repository of NTU)