Efficient Index-based Snippet Generation


Ranked result lists with query-dependent snippets have become state of the art in text search. They are typically implemented by searching, at query time, for occurrences of the query words in the top-ranked documents. This \emph{document-based} approach has three inherent problems: (i) when a document is indexed by terms which it does not contain literally (e.g., related words or spelling variants), localization of the corresponding snippets becomes problematic; (ii) each query operator (e.g., phrase or proximity search) has to be implemented twice, on the index side in order to compute the correct result set, and on the snippet generation side to generate the appropriate snippets; and (iii) in a worst case, the whole document needs to be scanned for occurrences of the query words, which is problematic for very long documents. We present a new \emph{index-based} method that localizes snippets by information solely computed from the index, and that overcomes all three problems. Unlike previous index-based methods, we show how to achieve this at essentially no extra cost in query processing time, by a technique we call \emph{query rotation}. We also show how our index-based method allows the caching of individual segments instead of complete documents, which enables a significantly larger cache hit ratio as compared to the document-based approach. We have fully integrated our implementation with the CompleteSearch engine

Similar works

Full text

oaioai:pure.mpg.de:item_1...Last time updated on 6/15/2019

This paper was published in MPG.PuRe.

Having an issue?

Is data on this page outdated, violates copyrights or anything else? Report the problem now and we will take corresponding actions after reviewing your request.