Search CORE

17,188 research outputs found

Relative NN-Descent: A Fast Index Construction for Graph-Based Approximate Nearest Neighbor Search

Author: Matsui Yusuke
Ono Naoki
Publication venue
Publication date: 31/10/2023
Field of study

Approximate Nearest Neighbor Search (ANNS) is the task of finding the database vector that is closest to a given query vector. Graph-based ANNS is the family of methods with the best balance of accuracy and speed for million-scale datasets. However, graph-based methods have the disadvantage of long index construction time. Recently, many researchers have improved the tradeoff between accuracy and speed during a search. However, there is little research on accelerating index construction. We propose a fast graph construction algorithm, Relative NN-Descent (RNN-Descent). RNN-Descent combines NN-Descent, an algorithm for constructing approximate K-nearest neighbor graphs (K-NN graphs), and RNG Strategy, an algorithm for selecting edges effective for search. This algorithm allows the direct construction of graph-based indexes without ANNS. Experimental results demonstrated that the proposed method had the fastest index construction speed, while its search performance is comparable to existing state-of-the-art methods such as NSG. For example, in experiments on the GIST1M dataset, the construction of the proposed method is 2x faster than NSG. Additionally, it was even faster than the construction speed of NN-Descent.Comment: Accepted by ACMMM 202

arXiv.org e-Print Archive

k-NN 검색 및 k-NN 그래프 생성을 위한 고속 근사 알고리즘

Author: Youngki Park
Publication venue: 서울대학교 대학원
Publication date: 01/02/2015
Field of study

학위논문 (박사)-- 서울대학교 대학원 : 전기·컴퓨터공학부, 2015. 2. 이상구.Finding k-nearest neighbors (k-NN) is an essential part of recommeder systems, information retrieval, and many data mining and machine learning algorithms. However, there are two main problems in finding k-nearest neighbors: 1) Existing approaches require a huge amount of time when the number of objects or dimensions is scale up. 2) The k-NN computation methods do not show the consistent performance over different search tasks and types of data. In this dissertation, we present fast and versatile algorithms for finding k-nearest neighbors in order to cope with these problems. The main contributions are summarized as follows: first, we present an efficient and scalable algorithm for finding an approximate k-NN graph by filtering node pairs whose large value dimensions do not match at all. Second, a fast collaborative filtering algorithm that utilizes k-NN graph is presented. The main idea of this approach is to reverse the process of finding k-nearest neighbors in item-based collaborative filtering. Last, we propose a fast approximate algorithm for k-NN search by selecting query-specific signatures from a signature pool to pick high-quality k-NN candidates.The experimental results show that the proposed algorithms guarantee a high level of accuracy while also being much faster than the other algorithms over different types of search tasks and datasets.Abstract i Contents iii List of Figures vii List of Tables xi Chapter 1 Introduction 1 1.1 Motivation and Challenges . . . . . . . . . . . . . . . . . . . . . . 2 1.1.1 Fast Approximation . . . . . . . . . . . . . . . . . . . . . 3 1.1.2 Versatility . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.2 Our Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.2.1 Greedy Filtering . . . . . . . . . . . . . . . . . . . . . . . 6 1.2.2 Signature Selection LSH . . . . . . . . . . . . . . . . . . . 7 1.2.3 Reversed CF . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.3 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 1.4 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 Chapter 2 Background and Related Work 14 2.1 k-NN Search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 2.1.1 Locality Sensitive Hashing . . . . . . . . . . . . . . . . . . 15 2.1.2 LSH-based k-NN Search . . . . . . . . . . . . . . . . . . . 16 2.2 k-NN Graph Construction . . . . . . . . . . . . . . . . . . . . . . 17 2.2.1 LSH-based Approach . . . . . . . . . . . . . . . . . . . . . 19 2.2.2 Clustering-based Approach . . . . . . . . . . . . . . . . . 19 2.2.3 Heuristic-based Approach . . . . . . . . . . . . . . . . . . 20 2.2.4 Similarity Join . . . . . . . . . . . . . . . . . . . . . . . . 21 2.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 Chapter 3 Fast Approximate k-NN Graph Construction 26 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 3.2 Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . . . 29 3.3 Constructing a k-Nearest Neighbor Graph . . . . . . . . . . . . . 29 3.3.1 Greedy Filtering . . . . . . . . . . . . . . . . . . . . . . . 29 3.3.2 Prefix Selection Scheme . . . . . . . . . . . . . . . . . . . 32 3.3.3 Optimization . . . . . . . . . . . . . . . . . . . . . . . . . 34 3.4 Theoretical Analysis . . . . . . . . . . . . . . . . . . . . . . . . . 36 3.4.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . 38 3.4.2 Graph Construction Time . . . . . . . . . . . . . . . . . . 39 3.4.3 Graph Accuracy . . . . . . . . . . . . . . . . . . . . . . . 40 3.5 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 3.5.1 Experimental Setup . . . . . . . . . . . . . . . . . . . . . 44 3.5.2 Performance Comparison . . . . . . . . . . . . . . . . . . 48 3.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 Chapter 4 Fast Collaborative Filtering 53 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 4.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 4.3 Fast Collaborative Filtering . . . . . . . . . . . . . . . . . . . . . 58 4.3.1 Nearest Neighbor Graph Construction . . . . . . . . . . . 58 4.3.2 Fast Recommendation Algorithm . . . . . . . . . . . . . . 60 4.4 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 4.4.1 Experimental Setup . . . . . . . . . . . . . . . . . . . . . 64 4.4.2 Overall Comparison . . . . . . . . . . . . . . . . . . . . . 65 4.4.3 Effects of Parameter Changes . . . . . . . . . . . . . . . . 68 4.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 Chapter 5 Fast Approximate k-NN Search 72 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 5.2 Signature Selection LSH . . . . . . . . . . . . . . . . . . . . . . . 74 5.2.1 Data-dependent LSH . . . . . . . . . . . . . . . . . . . . . 75 5.2.2 Signature Pool Generation . . . . . . . . . . . . . . . . . . 76 5.2.3 Signature Selection . . . . . . . . . . . . . . . . . . . . . . 79 5.2.4 Optimization Techniques . . . . . . . . . . . . . . . . . . 83 5.3 S2LSH for Graph Construction . . . . . . . . . . . . . . . . . . . 84 5.3.1 Feature Selection . . . . . . . . . . . . . . . . . . . . . . . 84 5.3.2 Signature Selection . . . . . . . . . . . . . . . . . . . . . . 84 5.3.3 Optimization Techniques . . . . . . . . . . . . . . . . . . 85 5.4 Theoretical Analysis . . . . . . . . . . . . . . . . . . . . . . . . . 86 5.5 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 5.5.1 Experimental Setup . . . . . . . . . . . . . . . . . . . . . 87 5.5.2 Experimental Results . . . . . . . . . . . . . . . . . . . . 91 5.5.3 Performance Analysis . . . . . . . . . . . . . . . . . . . . 97 5.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98 Chapter 6 Conclusion 103 Bibliography 105 초록 113Docto

SNU Open Repository and Archive

Fast k-means based on KNN Graph

Author: Deng Cheng-Hao
Zhao Wan-Lei
Publication venue
Publication date: 04/05/2017
Field of study

In the era of big data, k-means clustering has been widely adopted as a basic processing tool in various contexts. However, its computational cost could be prohibitively high as the data size and the cluster number are large. It is well known that the processing bottleneck of k-means lies in the operation of seeking closest centroid in each iteration. In this paper, a novel solution towards the scalability issue of k-means is presented. In the proposal, k-means is supported by an approximate k-nearest neighbors graph. In the k-means iteration, each data sample is only compared to clusters that its nearest neighbors reside. Since the number of nearest neighbors we consider is much less than k, the processing cost in this step becomes minor and irrelevant to k. The processing bottleneck is therefore overcome. The most interesting thing is that k-nearest neighbor graph is constructed by iteratively calling the fast

k

-means itself. Comparing with existing fast k-means variants, the proposed algorithm achieves hundreds to thousands times speed-up while maintaining high clustering quality. As it is tested on 10 million 512-dimensional data, it takes only 5.2 hours to produce 1 million clusters. In contrast, to fulfill the same scale of clustering, it would take 3 years for traditional k-means

arXiv.org e-Print Archive

Crossref

An Efficient Index for Visual Search in Appearance-based SLAM

Author: Hajebi Kiana
Zhang Hong
Publication venue
Publication date: 27/09/2013
Field of study

Vector-quantization can be a computationally expensive step in visual bag-of-words (BoW) search when the vocabulary is large. A BoW-based appearance SLAM needs to tackle this problem for an efficient real-time operation. We propose an effective method to speed up the vector-quantization process in BoW-based visual SLAM. We employ a graph-based nearest neighbor search (GNNS) algorithm to this aim, and experimentally show that it can outperform the state-of-the-art. The graph-based search structure used in GNNS can efficiently be integrated into the BoW model and the SLAM framework. The graph-based index, which is a k-NN graph, is built over the vocabulary words and can be extracted from the BoW's vocabulary construction procedure, by adding one iteration to the k-means clustering, which adds small extra cost. Moreover, exploiting the fact that images acquired for appearance-based SLAM are sequential, GNNS search can be initiated judiciously which helps increase the speedup of the quantization process considerably

arXiv.org e-Print Archive

CiteSeerX

Crossref