1,350 research outputs found
Information Retrieval: Recent Advances and Beyond
In this paper, we provide a detailed overview of the models used for
information retrieval in the first and second stages of the typical processing
chain. We discuss the current state-of-the-art models, including methods based
on terms, semantic retrieval, and neural. Additionally, we delve into the key
topics related to the learning process of these models. This way, this survey
offers a comprehensive understanding of the field and is of interest for for
researchers and practitioners entering/working in the information retrieval
domain
GNN-encoder: Learning a Dual-encoder Architecture via Graph Neural Networks for Passage Retrieval
Recently, retrieval models based on dense representations are dominant in
passage retrieval tasks, due to their outstanding ability in terms of capturing
semantics of input text compared to the traditional sparse vector space models.
A common practice of dense retrieval models is to exploit a dual-encoder
architecture to represent a query and a passage independently. Though
efficient, such a structure loses interaction between the query-passage pair,
resulting in inferior accuracy. To enhance the performance of dense retrieval
models without loss of efficiency, we propose a GNN-encoder model in which
query (passage) information is fused into passage (query) representations via
graph neural networks that are constructed by queries and their top retrieved
passages. By this means, we maintain a dual-encoder structure, and retain some
interaction information between query-passage pairs in their representations,
which enables us to achieve both efficiency and efficacy in passage retrieval.
Evaluation results indicate that our method significantly outperforms the
existing models on MSMARCO, Natural Questions and TriviaQA datasets, and
achieves the new state-of-the-art on these datasets.Comment: 11 pages, 6 figure
Not All Dialogues are Created Equal: Instance Weighting for Neural Conversational Models
Neural conversational models require substantial amounts of dialogue data for
their parameter estimation and are therefore usually learned on large corpora
such as chat forums or movie subtitles. These corpora are, however, often
challenging to work with, notably due to their frequent lack of turn
segmentation and the presence of multiple references external to the dialogue
itself. This paper shows that these challenges can be mitigated by adding a
weighting model into the architecture. The weighting model, which is itself
estimated from dialogue data, associates each training example to a numerical
weight that reflects its intrinsic quality for dialogue modelling. At training
time, these sample weights are included into the empirical loss to be
minimised. Evaluation results on retrieval-based models trained on movie and TV
subtitles demonstrate that the inclusion of such a weighting model improves the
model performance on unsupervised metrics.Comment: Accepted to SIGDIAL 201
Transfer Learning via Contextual Invariants for One-to-Many Cross-Domain Recommendation
The rapid proliferation of new users and items on the social web has
aggravated the gray-sheep user/long-tail item challenge in recommender systems.
Historically, cross-domain co-clustering methods have successfully leveraged
shared users and items across dense and sparse domains to improve inference
quality. However, they rely on shared rating data and cannot scale to multiple
sparse target domains (i.e., the one-to-many transfer setting). This, combined
with the increasing adoption of neural recommender architectures, motivates us
to develop scalable neural layer-transfer approaches for cross-domain learning.
Our key intuition is to guide neural collaborative filtering with
domain-invariant components shared across the dense and sparse domains,
improving the user and item representations learned in the sparse domains. We
leverage contextual invariances across domains to develop these shared modules,
and demonstrate that with user-item interaction context, we can learn-to-learn
informative representation spaces even with sparse interaction data. We show
the effectiveness and scalability of our approach on two public datasets and a
massive transaction dataset from Visa, a global payments technology company
(19% Item Recall, 3x faster vs. training separate models for each domain). Our
approach is applicable to both implicit and explicit feedback settings.Comment: SIGIR 202
SeDR: Segment Representation Learning for Long Documents Dense Retrieval
Recently, Dense Retrieval (DR) has become a promising solution to document
retrieval, where document representations are used to perform effective and
efficient semantic search. However, DR remains challenging on long documents,
due to the quadratic complexity of its Transformer-based encoder and the finite
capacity of a low-dimension embedding. Current DR models use suboptimal
strategies such as truncating or splitting-and-pooling to long documents
leading to poor utilization of whole document information. In this work, to
tackle this problem, we propose Segment representation learning for long
documents Dense Retrieval (SeDR). In SeDR, Segment-Interaction Transformer is
proposed to encode long documents into document-aware and segment-sensitive
representations, while it holds the complexity of splitting-and-pooling and
outperforms other segment-interaction patterns on DR. Since GPU memory
requirements for long document encoding causes insufficient negatives for DR
training, Late-Cache Negative is further proposed to provide additional cache
negatives for optimizing representation learning. Experiments on MS MARCO and
TREC-DL datasets show that SeDR achieves superior performance among DR models,
and confirm the effectiveness of SeDR on long document retrieval
- …