3,598 research outputs found
Overview of the personalized and collaborative information retrieval (PIR) track at FIRE-2011
The Personalized and collaborative Information Retrieval (PIR) track at FIRE 2011 was organized with an aim to extend standard information retrieval (IR) ad-hoc test collection design to facilitate research on personalized and collaborative IR by collecting additional meta-information during the topic (query) development process. A controlled query generation process through task-based activities with activity logging was used for each topic developer to construct the final list of topics. The standard ad-hoc collection is thus accompanied by a new set of thematically related topics and the associated log information. We believe this can better simulate a real-world search scenario and encourage mining user information from the logs to improve IR effectiveness. A set of 25 TREC formatted topics and the associated metadata of activity logs were released for the participants to use. In this paper we illustrate the data construction phase in detail and also outline two simple ways of using the additional information from the logs to improve retrieval effectiveness
The Lucene for Information Access and Retrieval Research (LIARR) Workshop at SIGIR 2017
As an empirical discipline, information access and retrieval research requires substantial software infrastructure to index and search large collections. This workshop is motivated by the desire to better align information retrieval research with the practice of building search applications from the perspective of open-source information retrieval systems. Our goal is to promote the use of Lucene for information access and retrieval research
Exploring sentence level query expansion in language modeling based information retrieval
We introduce two novel methods for query expansion in information retrieval (IR). The basis of these methods is to add the most similar sentences extracted from
pseudo-relevant documents to the original query. The first method adds a fixed number of sentences to the original query, the second a progressively decreasing number of sentences. We evaluate these methods on the English and Bengali test collections from the FIRE workshops. The major
findings of this study are that: i) performance is similar for both English and Bengali; ii) employing a smaller context (similar sentences) yields a considerably higher
mean average precision (MAP) compared to extracting terms from full documents (up to 5.9% improvemnent in MAP for
English and 10.7% for Bengali compared to standard Blind Relevance Feedback (BRF); iii) using a variable number of sentences for query expansion performs better and shows less variance in the best MAP for different parameter settings; iv) query expansion based on sentences can
improve performance even for topics with low initial retrieval precision where standard BRF fails
Query Resolution for Conversational Search with Limited Supervision
In this work we focus on multi-turn passage retrieval as a crucial component
of conversational search. One of the key challenges in multi-turn passage
retrieval comes from the fact that the current turn query is often
underspecified due to zero anaphora, topic change, or topic return. Context
from the conversational history can be used to arrive at a better expression of
the current turn query, defined as the task of query resolution. In this paper,
we model the query resolution task as a binary term classification problem: for
each term appearing in the previous turns of the conversation decide whether to
add it to the current turn query or not. We propose QuReTeC (Query Resolution
by Term Classification), a neural query resolution model based on bidirectional
transformers. We propose a distant supervision method to automatically generate
training data by using query-passage relevance labels. Such labels are often
readily available in a collection either as human annotations or inferred from
user interactions. We show that QuReTeC outperforms state-of-the-art models,
and furthermore, that our distant supervision method can be used to
substantially reduce the amount of human-curated data required to train
QuReTeC. We incorporate QuReTeC in a multi-turn, multi-stage passage retrieval
architecture and demonstrate its effectiveness on the TREC CAsT dataset.Comment: SIGIR 2020 full conference pape
Indexing with WordNet synsets can improve Text Retrieval
The classical, vector space model for text retrieval is shown to give better
results (up to 29% better in our experiments) if WordNet synsets are chosen as
the indexing space, instead of word forms. This result is obtained for a
manually disambiguated test collection (of queries and documents) derived from
the Semcor semantic concordance. The sensitivity of retrieval performance to
(automatic) disambiguation errors when indexing documents is also measured.
Finally, it is observed that if queries are not disambiguated, indexing by
synsets performs (at best) only as good as standard word indexing.Comment: 7 pages, LaTeX2e, 3 eps figures, uses epsfig, colacl.st
Document Distance for the Automated Expansion of Relevance Judgements for Information Retrieval Evaluation
This paper reports the use of a document distance-based approach to
automatically expand the number of available relevance judgements when these
are limited and reduced to only positive judgements. This may happen, for
example, when the only available judgements are extracted from a list of
references in a published review paper. We compare the results on two document
sets: OHSUMED, based on medical research publications, and TREC-8, based on
news feeds. We show that evaluations based on these expanded relevance
judgements are more reliable than those using only the initially available
judgements, especially when the number of available judgements is very limited.Comment: SIGIR 2014 Workshop on Gathering Efficient Assessments of Relevanc
- …