8 research outputs found
A Bandit Approach to Maximum Inner Product Search
There has been substantial research on sub-linear time approximate algorithms
for Maximum Inner Product Search (MIPS). To achieve fast query time,
state-of-the-art techniques require significant preprocessing, which can be a
burden when the number of subsequent queries is not sufficiently large to
amortize the cost. Furthermore, existing methods do not have the ability to
directly control the suboptimality of their approximate results with
theoretical guarantees. In this paper, we propose the first approximate
algorithm for MIPS that does not require any preprocessing, and allows users to
control and bound the suboptimality of the results. We cast MIPS as a Best Arm
Identification problem, and introduce a new bandit setting that can fully
exploit the special structure of MIPS. Our approach outperforms
state-of-the-art methods on both synthetic and real-world datasets.Comment: AAAI 201
Understanding and Improving Proximity Graph based Maximum Inner Product Search
The inner-product navigable small world graph (ip-NSW) represents the
state-of-the-art method for approximate maximum inner product search (MIPS) and
it can achieve an order of magnitude speedup over the fastest baseline.
However, to date it is still unclear where its exceptional performance comes
from. In this paper, we show that there is a strong norm bias in the MIPS
problem, which means that the large norm items are very likely to become the
result of MIPS. Then we explain the good performance of ip-NSW as matching the
norm bias of the MIPS problem - large norm items have big in-degrees in the
ip-NSW proximity graph and a walk on the graph spends the majority of
computation on these items, thus effectively avoids unnecessary computation on
small norm items. Furthermore, we propose the ip-NSW+ algorithm, which improves
ip-NSW by introducing an additional angular proximity graph. Search is first
conducted on the angular graph to find the angular neighbors of a query and
then the MIPS neighbors of these angular neighbors are used to initialize the
candidate pool for search on the inner-product proximity graph. Experiment
results show that ip-NSW+ consistently and significantly outperforms ip-NSW and
provides more robust performance under different data distributions.Comment: 8 pages, 8 figure
Revisiting Wedge Sampling for Budgeted Maximum Inner Product Search
Top-k maximum inner product search (MIPS) is a central task in many machine
learning applications. This paper extends top-k MIPS with a budgeted setting,
that asks for the best approximate top-k MIPS given a limit of B computational
operations. We investigate recent advanced sampling algorithms, including wedge
and diamond sampling to solve it. Though the design of these sampling schemes
naturally supports budgeted top-k MIPS, they suffer from the linear cost from
scanning all data points to retrieve top-k results and the performance
degradation for handling negative inputs.
This paper makes two main contributions. First, we show that diamond sampling
is essentially a combination between wedge sampling and basic sampling for
top-k MIPS. Our theoretical analysis and empirical evaluation show that wedge
is competitive (often superior) to diamond on approximating top-k MIPS
regarding both efficiency and accuracy. Second, we propose a series of
algorithmic engineering techniques to deploy wedge sampling on budgeted top-k
MIPS. Our novel deterministic wedge-based algorithm runs significantly faster
than the state-of-the-art methods for budgeted and exact top-k MIPS while
maintaining the top-5 precision at least 80% on standard recommender system
data sets.Comment: ECML-PKDD 202