87,134 research outputs found
Query Resolution for Conversational Search with Limited Supervision
In this work we focus on multi-turn passage retrieval as a crucial component
of conversational search. One of the key challenges in multi-turn passage
retrieval comes from the fact that the current turn query is often
underspecified due to zero anaphora, topic change, or topic return. Context
from the conversational history can be used to arrive at a better expression of
the current turn query, defined as the task of query resolution. In this paper,
we model the query resolution task as a binary term classification problem: for
each term appearing in the previous turns of the conversation decide whether to
add it to the current turn query or not. We propose QuReTeC (Query Resolution
by Term Classification), a neural query resolution model based on bidirectional
transformers. We propose a distant supervision method to automatically generate
training data by using query-passage relevance labels. Such labels are often
readily available in a collection either as human annotations or inferred from
user interactions. We show that QuReTeC outperforms state-of-the-art models,
and furthermore, that our distant supervision method can be used to
substantially reduce the amount of human-curated data required to train
QuReTeC. We incorporate QuReTeC in a multi-turn, multi-stage passage retrieval
architecture and demonstrate its effectiveness on the TREC CAsT dataset.Comment: SIGIR 2020 full conference pape
GNN-encoder: Learning a Dual-encoder Architecture via Graph Neural Networks for Passage Retrieval
Recently, retrieval models based on dense representations are dominant in
passage retrieval tasks, due to their outstanding ability in terms of capturing
semantics of input text compared to the traditional sparse vector space models.
A common practice of dense retrieval models is to exploit a dual-encoder
architecture to represent a query and a passage independently. Though
efficient, such a structure loses interaction between the query-passage pair,
resulting in inferior accuracy. To enhance the performance of dense retrieval
models without loss of efficiency, we propose a GNN-encoder model in which
query (passage) information is fused into passage (query) representations via
graph neural networks that are constructed by queries and their top retrieved
passages. By this means, we maintain a dual-encoder structure, and retain some
interaction information between query-passage pairs in their representations,
which enables us to achieve both efficiency and efficacy in passage retrieval.
Evaluation results indicate that our method significantly outperforms the
existing models on MSMARCO, Natural Questions and TriviaQA datasets, and
achieves the new state-of-the-art on these datasets.Comment: 11 pages, 6 figure
The State-of-the-arts in Focused Search
The continuous influx of various text data on the Web requires search engines to improve their retrieval abilities for more specific information. The need for relevant results to a userās topic of interest has gone beyond search for domain or type specific documents to more focused result (e.g. document fragments or answers to a query). The introduction of XML provides a format standard for data representation, storage, and exchange. It helps focused search to be carried out at different granularities of a structured document with XML markups. This report aims at reviewing the state-of-the-arts in focused search, particularly techniques for topic-specific document retrieval, passage retrieval, XML retrieval, and entity ranking. It is concluded with highlight of open problems
Learning to Rank in Generative Retrieval
Generative retrieval is a promising new paradigm in text retrieval that
generates identifier strings of relevant passages as the retrieval target. This
paradigm leverages powerful generation models and represents a new paradigm
distinct from traditional learning-to-rank methods. However, despite its rapid
development, current generative retrieval methods are still limited. They
typically rely on a heuristic function to transform predicted identifiers into
a passage rank list, which creates a gap between the learning objective of
generative retrieval and the desired passage ranking target. Moreover, the
inherent exposure bias problem of text generation also persists in generative
retrieval. To address these issues, we propose a novel framework, called LTRGR,
that combines generative retrieval with the classical learning-to-rank
paradigm. Our approach involves training an autoregressive model using a
passage rank loss, which directly optimizes the autoregressive model toward the
optimal passage ranking. This framework only requires an additional training
step to enhance current generative retrieval systems and does not add any
burden to the inference stage. We conducted experiments on three public
datasets, and our results demonstrate that LTRGR achieves state-of-the-art
performance among generative retrieval methods, indicating its effectiveness
and robustness
DCU at the NTCIR-12 SpokenQuery&Doc-2 task
We describe DCUās participation in the NTCIR-12 SpokenQuery&Doc (SQD-2) task. In the context of the slide-group
retrieval sub-task, we experiment with a passage retrieval
method that re-scores each passage according to the relevance score of the document from which the passage is taken.
This is performed by linearly interpolating their relevance
scores which are calculated using the Okapi BM25 model of
probabilistic retrieval for passages and documents independently. In conjunction with this, we assess the benefits of
using pseudo-relevance feedback for expanding the textual
representation of the spoken queries with terms found in the
top-ranked documents and passages, and experiment with
a general multidimensional optimisation method to jointly
tune the BM25 and query expansion parameters with queries
and relevance data from the NTCIR-11 SQD-1 task. Retrieval experiments performed over the SQD-1 and SQD-2
queries confirm previous findings which affirm that integrating document information when ranking passages can lead
to improved passage retrieval effectiveness. Furthermore,
results indicate that no significant gains in retrieval effectiveness can be obtained by using query expansion in combination with our retrieval models over these two query sets
- ā¦