Document expansion versus query expansion for ad-hoc retrieval

Abstract

In document information retrieval, the terminology given by a user may not match the terminology of a relevant document. Query expansion seeks to address this mismatch; it can significantly increase effectiveness, but is slow and resource-intensive. We investigate the use of document expansion as an alternative, in which documents are augmented with related terms extracted from the corpus during indexing, and the overheads at query time are small. We propose and explore a range of corpus-based document expansion techniques and compare them to corpus-based query expansion on TREC data. These experiments show that document expansion delivers at best limited benefts, while query expansion . including standard techniques and effcient approaches described in recent work . delivers consistent gains. We conclude that document expansion is unpromising, but it is likely that the effciency of query expansion can be further improved

    Similar works