1 research outputs found
An Axiomatic Study of Query Terms Order in Ad-hoc Retrieval
Classic retrieval methods use simple bag-of-word representations for queries
and documents. This representation fails to capture the full semantic richness
of queries and documents. More recent retrieval models have tried to overcome
this deficiency by using approaches such as incorporating dependencies between
query terms, using bi-gram representations of documents, proximity heuristics,
and passage retrieval. While some of these previous works have implicitly
accounted for term order, to the best of our knowledge, term order has not been
the primary focus of any research. In this paper, we focus solely on the effect
of term order in information retrieval. We will show that documents that have
two query terms in the same order as in the query have a higher probability of
being relevant than documents that have two query terms in the reverse order.
Using the axiomatic framework for information retrieval, we introduce a
constraint that retrieval models must adhere to in order to effectively utilize
term order dependency among query terms. We modify existing retrieval models
based on this constraint so that if the order of a pair of query terms is
semantically important, a document that includes these query terms in the same
order as the query should receive a higher score compared to a document that
includes them in the reverse order. Our empirical evaluation using both TREC
newswire and web corpora demonstrates that the modified retrieval models
significantly outperform their original counterparts.Comment: 7 pages, 1 figur