734 research outputs found
Efficient Document Re-Ranking for Transformers by Precomputing Term Representations
Deep pretrained transformer networks are effective at various ranking tasks,
such as question answering and ad-hoc document ranking. However, their
computational expenses deem them cost-prohibitive in practice. Our proposed
approach, called PreTTR (Precomputing Transformer Term Representations),
considerably reduces the query-time latency of deep transformer networks (up to
a 42x speedup on web document ranking) making these networks more practical to
use in a real-time ranking scenario. Specifically, we precompute part of the
document term representations at indexing time (without a query), and merge
them with the query representation at query time to compute the final ranking
score. Due to the large size of the token representations, we also propose an
effective approach to reduce the storage requirement by training a compression
layer to match attention scores. Our compression technique reduces the storage
required up to 95% and it can be applied without a substantial degradation in
ranking performance.Comment: Accepted at SIGIR 2020 (long
Discrete Factorization Machines for Fast Feature-based Recommendation
User and item features of side information are crucial for accurate
recommendation. However, the large number of feature dimensions, e.g., usually
larger than 10^7, results in expensive storage and computational cost. This
prohibits fast recommendation especially on mobile applications where the
computational resource is very limited. In this paper, we develop a generic
feature-based recommendation model, called Discrete Factorization Machine
(DFM), for fast and accurate recommendation. DFM binarizes the real-valued
model parameters (e.g., float32) of every feature embedding into binary codes
(e.g., boolean), and thus supports efficient storage and fast user-item score
computation. To avoid the severe quantization loss of the binarization, we
propose a convergent updating rule that resolves the challenging discrete
optimization of DFM. Through extensive experiments on two real-world datasets,
we show that 1) DFM consistently outperforms state-of-the-art binarized
recommendation models, and 2) DFM shows very competitive performance compared
to its real-valued version (FM), demonstrating the minimized quantization loss.
This work is accepted by IJCAI 2018.Comment: Appeared in IJCAI 201
DESSERT: An Efficient Algorithm for Vector Set Search with Vector Set Queries
We study the problem of with . This task is analogous to traditional near-neighbor search, with the
exception that both the query and each element in the collection are
of vectors. We identify this problem as a core subroutine for
semantic search applications and find that existing solutions are unacceptably
slow. Towards this end, we present a new approximate search algorithm, DESSERT
(ESSERT ffeciently earches ets of mbeddings via etrieval ables). DESSERT is a general tool
with strong theoretical guarantees and excellent empirical performance. When we
integrate DESSERT into ColBERT, a state-of-the-art semantic search model, we
find a 2-5x speedup on the MS MARCO and LoTTE retrieval benchmarks with minimal
loss in recall, underscoring the effectiveness and practical applicability of
our proposal.Comment: Code available, https://github.com/ThirdAIResearch/Desser
- …