2 research outputs found
AdANNS: A Framework for Adaptive Semantic Search
Web-scale search systems learn an encoder to embed a given query which is
then hooked into an approximate nearest neighbor search (ANNS) pipeline to
retrieve similar data points. To accurately capture tail queries and data
points, learned representations typically are rigid, high-dimensional vectors
that are generally used as-is in the entire ANNS pipeline and can lead to
computationally expensive retrieval. In this paper, we argue that instead of
rigid representations, different stages of ANNS can leverage adaptive
representations of varying capacities to achieve significantly better
accuracy-compute trade-offs, i.e., stages of ANNS that can get away with more
approximate computation should use a lower-capacity representation of the same
data point. To this end, we introduce AdANNS, a novel ANNS design framework
that explicitly leverages the flexibility of Matryoshka Representations. We
demonstrate state-of-the-art accuracy-compute trade-offs using novel
AdANNS-based key ANNS building blocks like search data structures (AdANNS-IVF)
and quantization (AdANNS-OPQ). For example on ImageNet retrieval, AdANNS-IVF is
up to 1.5% more accurate than the rigid representations-based IVF at the same
compute budget; and matches accuracy while being up to 90x faster in wall-clock
time. For Natural Questions, 32-byte AdANNS-OPQ matches the accuracy of the
64-byte OPQ baseline constructed using rigid representations -- same accuracy
at half the cost! We further show that the gains from AdANNS translate to
modern-day composite ANNS indices that combine search structures and
quantization. Finally, we demonstrate that AdANNS can enable inference-time
adaptivity for compute-aware search on ANNS indices built non-adaptively on
matryoshka representations. Code is open-sourced at
https://github.com/RAIVNLab/AdANNS.Comment: 25 pages, 15 figures. NeurIPS 2023 camera ready publicatio