14,758 research outputs found
Exploring Topic-based Language Models for Effective Web Information Retrieval
The main obstacle for providing focused search is the relative opaqueness of search request -- searchers tend to express their complex information needs in only a couple of keywords. Our overall aim is to find out if, and how, topic-based language models can lead to more effective web information retrieval. In this paper we explore retrieval performance of a topic-based model that combines topical models with other language models based on cross-entropy. We first define our topical categories and train our topical models on the .GOV2 corpus by building parsimonious language models. We then test the topic-based model on TREC8 small Web data collection for ad-hoc search.Our experimental results show that the topic-based model outperforms the standard language model and parsimonious model
Task-Oriented Query Reformulation with Reinforcement Learning
Search engines play an important role in our everyday lives by assisting us
in finding the information we need. When we input a complex query, however,
results are often far from satisfactory. In this work, we introduce a query
reformulation system based on a neural network that rewrites a query to
maximize the number of relevant documents returned. We train this neural
network with reinforcement learning. The actions correspond to selecting terms
to build a reformulated query, and the reward is the document recall. We
evaluate our approach on three datasets against strong baselines and show a
relative improvement of 5-20% in terms of recall. Furthermore, we present a
simple method to estimate a conservative upper-bound performance of a model in
a particular environment and verify that there is still large room for
improvements.Comment: EMNLP 201
Recommended from our members
A quasi-current representation for information needs inspired by Two-State Vector Formalism
Recently, a number of quantum theory (QT)-based information retrieval (IR) models have been proposed for modeling session search task that users issue queries continuously in order to describe their evolving information needs (IN). However, the standard formalism of QT cannot provide a complete description for users’ current IN in a sense that it does not take the ‘future’ information into consideration. Therefore, to seek a more proper and complete representation for users’ IN, we construct a representation of quasi-current IN inspired by an emerging Two-State Vector Formalism (TSVF). With the enlightenment of the completeness of TSVF, a “two-state vector” derived from the ‘future’ (the current query) and the ‘history’ (the previous query) is employed to describe users’ quasi-current IN in a more complete way. Extensive experiments are conducted on the session tracks of TREC 2013 & 2014, and show that our model outperforms a series of compared IR models
Embedding Web-based Statistical Translation Models in Cross-Language Information Retrieval
Although more and more language pairs are covered by machine translation
services, there are still many pairs that lack translation resources.
Cross-language information retrieval (CLIR) is an application which needs
translation functionality of a relatively low level of sophistication since
current models for information retrieval (IR) are still based on a
bag-of-words. The Web provides a vast resource for the automatic construction
of parallel corpora which can be used to train statistical translation models
automatically. The resulting translation models can be embedded in several ways
in a retrieval model. In this paper, we will investigate the problem of
automatically mining parallel texts from the Web and different ways of
integrating the translation models within the retrieval process. Our
experiments on standard test collections for CLIR show that the Web-based
translation models can surpass commercial MT systems in CLIR tasks. These
results open the perspective of constructing a fully automatic query
translation device for CLIR at a very low cost.Comment: 37 page
- …