29 research outputs found
Learning binary codes for maximum inner product search
Binary coding or hashing techniques are recognized to accomplish efficient near neighbor search, and have thus attracted broad interests in the recent vision and learning studies. However, such studies have rarely been dedicated to Maximum Inner Product Search (MIPS), which plays a critical role in various vision applications. In this paper, we investigate learning binary codes to exclusively handle the MIPS problem. Inspired by the latest advance in asymmetric hashing schemes, we propose an asymmetric binary code learning framework based on inner product fitting. Specifically, two sets of coding functions are learned such that the inner products between their generated binary codes can reveal the inner products between original data vectors. We also propose an alternative simpler objective which maximizes the correlations between the inner products of the produced binary codes and raw data vectors. In both objectives, the binary codes and coding functions are simultaneously learned without continuous relaxations, which is the key to achieving high-quality binary codes. We evaluate the proposed method, dubbed Asymmetric Inner-product Binary Coding (AIBC), relying on the two objectives on several large-scale image datasets. Both of them are superior to the state-of-the-art binary coding and hashing methods in performing MIPS tasks
Revisiting Wedge Sampling for Budgeted Maximum Inner Product Search
Top-k maximum inner product search (MIPS) is a central task in many machine
learning applications. This paper extends top-k MIPS with a budgeted setting,
that asks for the best approximate top-k MIPS given a limit of B computational
operations. We investigate recent advanced sampling algorithms, including wedge
and diamond sampling to solve it. Though the design of these sampling schemes
naturally supports budgeted top-k MIPS, they suffer from the linear cost from
scanning all data points to retrieve top-k results and the performance
degradation for handling negative inputs.
This paper makes two main contributions. First, we show that diamond sampling
is essentially a combination between wedge sampling and basic sampling for
top-k MIPS. Our theoretical analysis and empirical evaluation show that wedge
is competitive (often superior) to diamond on approximating top-k MIPS
regarding both efficiency and accuracy. Second, we propose a series of
algorithmic engineering techniques to deploy wedge sampling on budgeted top-k
MIPS. Our novel deterministic wedge-based algorithm runs significantly faster
than the state-of-the-art methods for budgeted and exact top-k MIPS while
maintaining the top-5 precision at least 80% on standard recommender system
data sets.Comment: ECML-PKDD 202