Recent studies show that BM25-driven dynamic index skipping can greatly
accelerate MaxScore-based document retrieval based on the learned sparse
representation derived by DeepImpact. This paper investigates the effectiveness
of such a traversal guidance strategy during top k retrieval when using other
models such as SPLADE and uniCOIL, and finds that unconstrained BM25-driven
skipping could have a visible relevance degradation when the BM25 model is not
well aligned with a learned weight model or when retrieval depth k is small.
This paper generalizes the previous work and optimizes the BM25 guided index
traversal with a two-level pruning control scheme and model alignment for fast
retrieval using a sparse representation. Although there can be a cost of
increased latency, the proposed scheme is much faster than the original
MaxScore method without BM25 guidance while retaining the relevance
effectiveness. This paper analyzes the competitiveness of this two-level
pruning scheme, and evaluates its tradeoff in ranking relevance and time
efficiency when searching several test datasets.Comment: This paper is published in WWW'2