520 research outputs found
Query Resolution for Conversational Search with Limited Supervision
In this work we focus on multi-turn passage retrieval as a crucial component
of conversational search. One of the key challenges in multi-turn passage
retrieval comes from the fact that the current turn query is often
underspecified due to zero anaphora, topic change, or topic return. Context
from the conversational history can be used to arrive at a better expression of
the current turn query, defined as the task of query resolution. In this paper,
we model the query resolution task as a binary term classification problem: for
each term appearing in the previous turns of the conversation decide whether to
add it to the current turn query or not. We propose QuReTeC (Query Resolution
by Term Classification), a neural query resolution model based on bidirectional
transformers. We propose a distant supervision method to automatically generate
training data by using query-passage relevance labels. Such labels are often
readily available in a collection either as human annotations or inferred from
user interactions. We show that QuReTeC outperforms state-of-the-art models,
and furthermore, that our distant supervision method can be used to
substantially reduce the amount of human-curated data required to train
QuReTeC. We incorporate QuReTeC in a multi-turn, multi-stage passage retrieval
architecture and demonstrate its effectiveness on the TREC CAsT dataset.Comment: SIGIR 2020 full conference pape
Training Curricula for Open Domain Answer Re-Ranking
In precision-oriented tasks like answer ranking, it is more important to rank
many relevant answers highly than to retrieve all relevant answers. It follows
that a good ranking strategy would be to learn how to identify the easiest
correct answers first (i.e., assign a high ranking score to answers that have
characteristics that usually indicate relevance, and a low ranking score to
those with characteristics that do not), before incorporating more complex
logic to handle difficult cases (e.g., semantic matching or reasoning). In this
work, we apply this idea to the training of neural answer rankers using
curriculum learning. We propose several heuristics to estimate the difficulty
of a given training sample. We show that the proposed heuristics can be used to
build a training curriculum that down-weights difficult samples early in the
training process. As the training process progresses, our approach gradually
shifts to weighting all samples equally, regardless of difficulty. We present a
comprehensive evaluation of our proposed idea on three answer ranking datasets.
Results show that our approach leads to superior performance of two leading
neural ranking architectures, namely BERT and ConvKNRM, using both pointwise
and pairwise losses. When applied to a BERT-based ranker, our method yields up
to a 4% improvement in MRR and a 9% improvement in P@1 (compared to the model
trained without a curriculum). This results in models that can achieve
comparable performance to more expensive state-of-the-art techniques.Comment: Accepted at SIGIR 2020 (long
Query Exposure Prediction for Groups of Documents in Rankings
The main objective of an Information Retrieval system is to provide a user
with the most relevant documents to the user's query. To do this, modern IR
systems typically deploy a re-ranking pipeline in which a set of documents is
retrieved by a lightweight first-stage retrieval process and then re-ranked by
a more effective but expensive model. However, the success of a re-ranking
pipeline is heavily dependent on the performance of the first stage retrieval,
since new documents are not usually identified during the re-ranking stage.
Moreover, this can impact the amount of exposure that a particular group of
documents, such as documents from a particular demographic group, can receive
in the final ranking. For example, the fair allocation of exposure becomes more
challenging or impossible if the first stage retrieval returns too few
documents from certain groups, since the number of group documents in the
ranking affects the exposure more than the documents' positions. With this in
mind, it is beneficial to predict the amount of exposure that a group of
documents is likely to receive in the results of the first stage retrieval
process, in order to ensure that there are a sufficient number of documents
included from each of the groups. In this paper, we introduce the novel task of
query exposure prediction (QEP). Specifically, we propose the first approach
for predicting the distribution of exposure that groups of documents will
receive for a given query. Our new approach, called GEP, uses lexical
information from individual groups of documents to estimate the exposure the
groups will receive in a ranking. Our experiments on the TREC 2021 and 2022
Fair Ranking Track test collections show that our proposed GEP approach results
in exposure predictions that are up to 40 % more accurate than the predictions
of adapted existing query performance prediction and resource allocation
approaches
- …