2 research outputs found

    An Axiomatic Study of Query Terms Order in Ad-hoc Retrieval

    Full text link
    Classic retrieval methods use simple bag-of-word representations for queries and documents. This representation fails to capture the full semantic richness of queries and documents. More recent retrieval models have tried to overcome this deficiency by using approaches such as incorporating dependencies between query terms, using bi-gram representations of documents, proximity heuristics, and passage retrieval. While some of these previous works have implicitly accounted for term order, to the best of our knowledge, term order has not been the primary focus of any research. In this paper, we focus solely on the effect of term order in information retrieval. We will show that documents that have two query terms in the same order as in the query have a higher probability of being relevant than documents that have two query terms in the reverse order. Using the axiomatic framework for information retrieval, we introduce a constraint that retrieval models must adhere to in order to effectively utilize term order dependency among query terms. We modify existing retrieval models based on this constraint so that if the order of a pair of query terms is semantically important, a document that includes these query terms in the same order as the query should receive a higher score compared to a document that includes them in the reverse order. Our empirical evaluation using both TREC newswire and web corpora demonstrates that the modified retrieval models significantly outperform their original counterparts.Comment: 7 pages, 1 figur

    Deep Neural Networks for Query Expansion using Word Embeddings

    Full text link
    Query expansion is a method for alleviating the vocabulary mismatch problem present in information retrieval tasks. Previous works have shown that terms selected for query expansion by traditional methods such as pseudo-relevance feedback are not always helpful to the retrieval process. In this paper, we show that this is also true for more recently proposed embedding-based query expansion methods. We then introduce an artificial neural network classifier to predict the usefulness of query expansion terms. This classifier uses term word embeddings as inputs. We perform experiments on four TREC newswire and web collections show that using terms selected by the classifier for expansion significantly improves retrieval performance when compared to competitive baselines. The results are also shown to be more robust than the baselines.Comment: 8 pages, 1 figur