17,188 research outputs found
Relative NN-Descent: A Fast Index Construction for Graph-Based Approximate Nearest Neighbor Search
Approximate Nearest Neighbor Search (ANNS) is the task of finding the
database vector that is closest to a given query vector. Graph-based ANNS is
the family of methods with the best balance of accuracy and speed for
million-scale datasets. However, graph-based methods have the disadvantage of
long index construction time. Recently, many researchers have improved the
tradeoff between accuracy and speed during a search. However, there is little
research on accelerating index construction. We propose a fast graph
construction algorithm, Relative NN-Descent (RNN-Descent). RNN-Descent combines
NN-Descent, an algorithm for constructing approximate K-nearest neighbor graphs
(K-NN graphs), and RNG Strategy, an algorithm for selecting edges effective for
search. This algorithm allows the direct construction of graph-based indexes
without ANNS. Experimental results demonstrated that the proposed method had
the fastest index construction speed, while its search performance is
comparable to existing state-of-the-art methods such as NSG. For example, in
experiments on the GIST1M dataset, the construction of the proposed method is
2x faster than NSG. Additionally, it was even faster than the construction
speed of NN-Descent.Comment: Accepted by ACMMM 202
k-NN κ²μ λ° k-NN κ·Έλν μμ±μ μν κ³ μ κ·Όμ¬ μκ³ λ¦¬μ¦
νμλ
Όλ¬Έ (λ°μ¬)-- μμΈλνκ΅ λνμ : μ κΈ°Β·μ»΄ν¨ν°κ³΅νλΆ, 2015. 2. μ΄μꡬ.Finding k-nearest neighbors (k-NN) is an essential part of recommeder systems, information retrieval, and many data mining and machine learning algorithms. However, there are two main problems in finding k-nearest neighbors: 1) Existing approaches require a huge amount of time when the number of objects or dimensions is scale up. 2) The k-NN computation methods do not show the consistent performance over different search tasks and types of data. In this dissertation, we present fast and versatile algorithms for finding k-nearest neighbors in order to cope with these problems. The main contributions are summarized as follows: first, we present an efficient and scalable algorithm for finding an approximate k-NN graph by filtering node pairs whose large value dimensions do not match at all. Second, a fast collaborative filtering algorithm that utilizes k-NN graph is presented. The main idea of this approach is to reverse the process of finding k-nearest neighbors in item-based collaborative filtering. Last, we propose a fast approximate algorithm for k-NN search by selecting query-specific signatures from a signature pool to pick high-quality k-NN candidates.The experimental results show that the proposed algorithms guarantee a high level of accuracy while also being much faster than the other algorithms over different types of search tasks and datasets.Abstract i
Contents iii
List of Figures vii
List of Tables xi
Chapter 1 Introduction 1
1.1 Motivation and Challenges . . . . . . . . . . . . . . . . . . . . . . 2
1.1.1 Fast Approximation . . . . . . . . . . . . . . . . . . . . . 3
1.1.2 Versatility . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2 Our Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2.1 Greedy Filtering . . . . . . . . . . . . . . . . . . . . . . . 6
1.2.2 Signature Selection LSH . . . . . . . . . . . . . . . . . . . 7
1.2.3 Reversed CF . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.3 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.4 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
Chapter 2 Background and Related Work 14
2.1 k-NN Search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.1.1 Locality Sensitive Hashing . . . . . . . . . . . . . . . . . . 15
2.1.2 LSH-based k-NN Search . . . . . . . . . . . . . . . . . . . 16
2.2 k-NN Graph Construction . . . . . . . . . . . . . . . . . . . . . . 17
2.2.1 LSH-based Approach . . . . . . . . . . . . . . . . . . . . . 19
2.2.2 Clustering-based Approach . . . . . . . . . . . . . . . . . 19
2.2.3 Heuristic-based Approach . . . . . . . . . . . . . . . . . . 20
2.2.4 Similarity Join . . . . . . . . . . . . . . . . . . . . . . . . 21
2.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
Chapter 3 Fast Approximate k-NN Graph Construction 26
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.2 Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.3 Constructing a k-Nearest Neighbor Graph . . . . . . . . . . . . . 29
3.3.1 Greedy Filtering . . . . . . . . . . . . . . . . . . . . . . . 29
3.3.2 Prefix Selection Scheme . . . . . . . . . . . . . . . . . . . 32
3.3.3 Optimization . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.4 Theoretical Analysis . . . . . . . . . . . . . . . . . . . . . . . . . 36
3.4.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . 38
3.4.2 Graph Construction Time . . . . . . . . . . . . . . . . . . 39
3.4.3 Graph Accuracy . . . . . . . . . . . . . . . . . . . . . . . 40
3.5 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
3.5.1 Experimental Setup . . . . . . . . . . . . . . . . . . . . . 44
3.5.2 Performance Comparison . . . . . . . . . . . . . . . . . . 48
3.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
Chapter 4 Fast Collaborative Filtering 53
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
4.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
4.3 Fast Collaborative Filtering . . . . . . . . . . . . . . . . . . . . . 58
4.3.1 Nearest Neighbor Graph Construction . . . . . . . . . . . 58
4.3.2 Fast Recommendation Algorithm . . . . . . . . . . . . . . 60
4.4 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
4.4.1 Experimental Setup . . . . . . . . . . . . . . . . . . . . . 64
4.4.2 Overall Comparison . . . . . . . . . . . . . . . . . . . . . 65
4.4.3 Effects of Parameter Changes . . . . . . . . . . . . . . . . 68
4.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
Chapter 5 Fast Approximate k-NN Search 72
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
5.2 Signature Selection LSH . . . . . . . . . . . . . . . . . . . . . . . 74
5.2.1 Data-dependent LSH . . . . . . . . . . . . . . . . . . . . . 75
5.2.2 Signature Pool Generation . . . . . . . . . . . . . . . . . . 76
5.2.3 Signature Selection . . . . . . . . . . . . . . . . . . . . . . 79
5.2.4 Optimization Techniques . . . . . . . . . . . . . . . . . . 83
5.3 S2LSH for Graph Construction . . . . . . . . . . . . . . . . . . . 84
5.3.1 Feature Selection . . . . . . . . . . . . . . . . . . . . . . . 84
5.3.2 Signature Selection . . . . . . . . . . . . . . . . . . . . . . 84
5.3.3 Optimization Techniques . . . . . . . . . . . . . . . . . . 85
5.4 Theoretical Analysis . . . . . . . . . . . . . . . . . . . . . . . . . 86
5.5 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
5.5.1 Experimental Setup . . . . . . . . . . . . . . . . . . . . . 87
5.5.2 Experimental Results . . . . . . . . . . . . . . . . . . . . 91
5.5.3 Performance Analysis . . . . . . . . . . . . . . . . . . . . 97
5.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
Chapter 6 Conclusion 103
Bibliography 105
μ΄λ‘ 113Docto
Fast k-means based on KNN Graph
In the era of big data, k-means clustering has been widely adopted as a basic
processing tool in various contexts. However, its computational cost could be
prohibitively high as the data size and the cluster number are large. It is
well known that the processing bottleneck of k-means lies in the operation of
seeking closest centroid in each iteration. In this paper, a novel solution
towards the scalability issue of k-means is presented. In the proposal, k-means
is supported by an approximate k-nearest neighbors graph. In the k-means
iteration, each data sample is only compared to clusters that its nearest
neighbors reside. Since the number of nearest neighbors we consider is much
less than k, the processing cost in this step becomes minor and irrelevant to
k. The processing bottleneck is therefore overcome. The most interesting thing
is that k-nearest neighbor graph is constructed by iteratively calling the fast
-means itself. Comparing with existing fast k-means variants, the proposed
algorithm achieves hundreds to thousands times speed-up while maintaining high
clustering quality. As it is tested on 10 million 512-dimensional data, it
takes only 5.2 hours to produce 1 million clusters. In contrast, to fulfill the
same scale of clustering, it would take 3 years for traditional k-means
An Efficient Index for Visual Search in Appearance-based SLAM
Vector-quantization can be a computationally expensive step in visual
bag-of-words (BoW) search when the vocabulary is large. A BoW-based appearance
SLAM needs to tackle this problem for an efficient real-time operation. We
propose an effective method to speed up the vector-quantization process in
BoW-based visual SLAM. We employ a graph-based nearest neighbor search (GNNS)
algorithm to this aim, and experimentally show that it can outperform the
state-of-the-art. The graph-based search structure used in GNNS can efficiently
be integrated into the BoW model and the SLAM framework. The graph-based index,
which is a k-NN graph, is built over the vocabulary words and can be extracted
from the BoW's vocabulary construction procedure, by adding one iteration to
the k-means clustering, which adds small extra cost. Moreover, exploiting the
fact that images acquired for appearance-based SLAM are sequential, GNNS search
can be initiated judiciously which helps increase the speedup of the
quantization process considerably
- β¦