186,535 research outputs found

    Fast Nearest Neighbor Machine Translation

    Full text link
    Though nearest neighbor Machine Translation (kkNN-MT) \citep{khandelwal2020nearest} has proved to introduce significant performance boosts over standard neural MT systems, it is prohibitively slow since it uses the entire reference corpus as the datastore for the nearest neighbor search. This means each step for each beam in the beam search has to search over the entire reference corpus. kkNN-MT is thus two-orders slower than vanilla MT models, making it hard to be applied to real-world applications, especially online services. In this work, we propose Fast kkNN-MT to address this issue. Fast kkNN-MT constructs a significantly smaller datastore for the nearest neighbor search: for each word in a source sentence, Fast kkNN-MT first selects its nearest token-level neighbors, which is limited to tokens that are the same as the query token. Then at each decoding step, in contrast to using the entire corpus as the datastore, the search space is limited to target tokens corresponding to the previously selected reference source tokens. This strategy avoids search through the whole datastore for nearest neighbors and drastically improves decoding efficiency. Without loss of performance, Fast kkNN-MT is two-orders faster than kkNN-MT, and is only two times slower than the standard NMT model. Fast kkNN-MT enables the practical use of kkNN-MT systems in real-world MT applications. The code is available at \url{https://github.com/ShannonAI/fast-knn-nmt}Comment: To appear at ACL 2022 Finding

    k-NN 검색 및 k-NN 그래프 생성을 위한 고속 근사 알고리즘

    Get PDF
    학위논문 (박사)-- 서울대학교 대학원 : 전기·컴퓨터공학부, 2015. 2. 이상구.Finding k-nearest neighbors (k-NN) is an essential part of recommeder systems, information retrieval, and many data mining and machine learning algorithms. However, there are two main problems in finding k-nearest neighbors: 1) Existing approaches require a huge amount of time when the number of objects or dimensions is scale up. 2) The k-NN computation methods do not show the consistent performance over different search tasks and types of data. In this dissertation, we present fast and versatile algorithms for finding k-nearest neighbors in order to cope with these problems. The main contributions are summarized as follows: first, we present an efficient and scalable algorithm for finding an approximate k-NN graph by filtering node pairs whose large value dimensions do not match at all. Second, a fast collaborative filtering algorithm that utilizes k-NN graph is presented. The main idea of this approach is to reverse the process of finding k-nearest neighbors in item-based collaborative filtering. Last, we propose a fast approximate algorithm for k-NN search by selecting query-specific signatures from a signature pool to pick high-quality k-NN candidates.The experimental results show that the proposed algorithms guarantee a high level of accuracy while also being much faster than the other algorithms over different types of search tasks and datasets.Abstract i Contents iii List of Figures vii List of Tables xi Chapter 1 Introduction 1 1.1 Motivation and Challenges . . . . . . . . . . . . . . . . . . . . . . 2 1.1.1 Fast Approximation . . . . . . . . . . . . . . . . . . . . . 3 1.1.2 Versatility . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.2 Our Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.2.1 Greedy Filtering . . . . . . . . . . . . . . . . . . . . . . . 6 1.2.2 Signature Selection LSH . . . . . . . . . . . . . . . . . . . 7 1.2.3 Reversed CF . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.3 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 1.4 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 Chapter 2 Background and Related Work 14 2.1 k-NN Search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 2.1.1 Locality Sensitive Hashing . . . . . . . . . . . . . . . . . . 15 2.1.2 LSH-based k-NN Search . . . . . . . . . . . . . . . . . . . 16 2.2 k-NN Graph Construction . . . . . . . . . . . . . . . . . . . . . . 17 2.2.1 LSH-based Approach . . . . . . . . . . . . . . . . . . . . . 19 2.2.2 Clustering-based Approach . . . . . . . . . . . . . . . . . 19 2.2.3 Heuristic-based Approach . . . . . . . . . . . . . . . . . . 20 2.2.4 Similarity Join . . . . . . . . . . . . . . . . . . . . . . . . 21 2.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 Chapter 3 Fast Approximate k-NN Graph Construction 26 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 3.2 Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . . . 29 3.3 Constructing a k-Nearest Neighbor Graph . . . . . . . . . . . . . 29 3.3.1 Greedy Filtering . . . . . . . . . . . . . . . . . . . . . . . 29 3.3.2 Prefix Selection Scheme . . . . . . . . . . . . . . . . . . . 32 3.3.3 Optimization . . . . . . . . . . . . . . . . . . . . . . . . . 34 3.4 Theoretical Analysis . . . . . . . . . . . . . . . . . . . . . . . . . 36 3.4.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . 38 3.4.2 Graph Construction Time . . . . . . . . . . . . . . . . . . 39 3.4.3 Graph Accuracy . . . . . . . . . . . . . . . . . . . . . . . 40 3.5 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 3.5.1 Experimental Setup . . . . . . . . . . . . . . . . . . . . . 44 3.5.2 Performance Comparison . . . . . . . . . . . . . . . . . . 48 3.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 Chapter 4 Fast Collaborative Filtering 53 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 4.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 4.3 Fast Collaborative Filtering . . . . . . . . . . . . . . . . . . . . . 58 4.3.1 Nearest Neighbor Graph Construction . . . . . . . . . . . 58 4.3.2 Fast Recommendation Algorithm . . . . . . . . . . . . . . 60 4.4 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 4.4.1 Experimental Setup . . . . . . . . . . . . . . . . . . . . . 64 4.4.2 Overall Comparison . . . . . . . . . . . . . . . . . . . . . 65 4.4.3 Effects of Parameter Changes . . . . . . . . . . . . . . . . 68 4.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 Chapter 5 Fast Approximate k-NN Search 72 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 5.2 Signature Selection LSH . . . . . . . . . . . . . . . . . . . . . . . 74 5.2.1 Data-dependent LSH . . . . . . . . . . . . . . . . . . . . . 75 5.2.2 Signature Pool Generation . . . . . . . . . . . . . . . . . . 76 5.2.3 Signature Selection . . . . . . . . . . . . . . . . . . . . . . 79 5.2.4 Optimization Techniques . . . . . . . . . . . . . . . . . . 83 5.3 S2LSH for Graph Construction . . . . . . . . . . . . . . . . . . . 84 5.3.1 Feature Selection . . . . . . . . . . . . . . . . . . . . . . . 84 5.3.2 Signature Selection . . . . . . . . . . . . . . . . . . . . . . 84 5.3.3 Optimization Techniques . . . . . . . . . . . . . . . . . . 85 5.4 Theoretical Analysis . . . . . . . . . . . . . . . . . . . . . . . . . 86 5.5 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 5.5.1 Experimental Setup . . . . . . . . . . . . . . . . . . . . . 87 5.5.2 Experimental Results . . . . . . . . . . . . . . . . . . . . 91 5.5.3 Performance Analysis . . . . . . . . . . . . . . . . . . . . 97 5.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98 Chapter 6 Conclusion 103 Bibliography 105 초록 113Docto

    Relative NN-Descent: A Fast Index Construction for Graph-Based Approximate Nearest Neighbor Search

    Full text link
    Approximate Nearest Neighbor Search (ANNS) is the task of finding the database vector that is closest to a given query vector. Graph-based ANNS is the family of methods with the best balance of accuracy and speed for million-scale datasets. However, graph-based methods have the disadvantage of long index construction time. Recently, many researchers have improved the tradeoff between accuracy and speed during a search. However, there is little research on accelerating index construction. We propose a fast graph construction algorithm, Relative NN-Descent (RNN-Descent). RNN-Descent combines NN-Descent, an algorithm for constructing approximate K-nearest neighbor graphs (K-NN graphs), and RNG Strategy, an algorithm for selecting edges effective for search. This algorithm allows the direct construction of graph-based indexes without ANNS. Experimental results demonstrated that the proposed method had the fastest index construction speed, while its search performance is comparable to existing state-of-the-art methods such as NSG. For example, in experiments on the GIST1M dataset, the construction of the proposed method is 2x faster than NSG. Additionally, it was even faster than the construction speed of NN-Descent.Comment: Accepted by ACMMM 202

    Efficient k-NN Search with Cross-Encoders using Adaptive Multi-Round CUR Decomposition

    Full text link
    Cross-encoder models, which jointly encode and score a query-item pair, are prohibitively expensive for direct k-nearest neighbor (k-NN) search. Consequently, k-NN search typically employs a fast approximate retrieval (e.g. using BM25 or dual-encoder vectors), followed by reranking with a cross-encoder; however, the retrieval approximation often has detrimental recall regret. This problem is tackled by ANNCUR (Yadav et al., 2022), a recent work that employs a cross-encoder only, making search efficient using a relatively small number of anchor items, and a CUR matrix factorization. While ANNCUR's one-time selection of anchors tends to approximate the cross-encoder distances on average, doing so forfeits the capacity to accurately estimate distances to items near the query, leading to regret in the crucial end-task: recall of top-k items. In this paper, we propose ADACUR, a method that adaptively, iteratively, and efficiently minimizes the approximation error for the practically important top-k neighbors. It does so by iteratively performing k-NN search using the anchors available so far, then adding these retrieved nearest neighbors to the anchor set for the next round. Empirically, on multiple datasets, in comparison to previous traditional and state-of-the-art methods such as ANNCUR and dual-encoder-based retrieve-and-rerank, our proposed approach ADACUR consistently reduces recall error-by up to 70% on the important k = 1 setting-while using no more compute than its competitors.Comment: Findings of EMNLP 202

    Fast Parameterless Ballistic Launch Point Estimation based on k-NN Search

    Get PDF
    This paper discusses the problem of estimating a ballistic trajectory and the launch point by using a trajectory similarity search in a database. The major difficulty of this problem is that estimation accuracy is guaranteed only when an identical trajectory exists in the trajectory database (TD). Hence, the TD must comprise an impractically great number of trajectories from various launch points. Authors proposed a simplified trajectory database with a single launch point and a trajectory similarity search algorithm that decomposes trajectory similarity into velocity and position components. These similarities are applied k-NN estimation. Furthermore, they used the iDistance technique to partition the data space of the high-dimensional database for an efficient k-NN search. Authors proved the effectiveness of the proposed algorithm by experiment.Defence Science Journal, Vol. 64, No. 1, January 2014, DOI:10.14429/dsj.64.295

    ANN-Benchmarks: A Benchmarking Tool for Approximate Nearest Neighbor Algorithms

    Get PDF
    This paper describes ANN-Benchmarks, a tool for evaluating the performance of in-memory approximate nearest neighbor algorithms. It provides a standard interface for measuring the performance and quality achieved by nearest neighbor algorithms on different standard data sets. It supports several different ways of integrating kk-NN algorithms, and its configuration system automatically tests a range of parameter settings for each algorithm. Algorithms are compared with respect to many different (approximate) quality measures, and adding more is easy and fast; the included plotting front-ends can visualise these as images, LaTeX\LaTeX plots, and websites with interactive plots. ANN-Benchmarks aims to provide a constantly updated overview of the current state of the art of kk-NN algorithms. In the short term, this overview allows users to choose the correct kk-NN algorithm and parameters for their similarity search task; in the longer term, algorithm designers will be able to use this overview to test and refine automatic parameter tuning. The paper gives an overview of the system, evaluates the results of the benchmark, and points out directions for future work. Interestingly, very different approaches to kk-NN search yield comparable quality-performance trade-offs. The system is available at http://ann-benchmarks.com .Comment: Full version of the SISAP 2017 conference paper. v2: Updated the abstract to avoid arXiv linking to the wrong UR
    corecore