4 research outputs found
NPRF: A Neural Pseudo Relevance Feedback Framework for Ad-hoc Information Retrieval
Pseudo-relevance feedback (PRF) is commonly used to boost the performance of
traditional information retrieval (IR) models by using top-ranked documents to
identify and weight new query terms, thereby reducing the effect of
query-document vocabulary mismatches. While neural retrieval models have
recently demonstrated strong results for ad-hoc retrieval, combining them with
PRF is not straightforward due to incompatibilities between existing PRF
approaches and neural architectures. To bridge this gap, we propose an
end-to-end neural PRF framework that can be used with existing neural IR models
by embedding different neural models as building blocks. Extensive experiments
on two standard test collections confirm the effectiveness of the proposed NPRF
framework in improving the performance of two state-of-the-art neural IR
models.Comment: Full paper in EMNLP 201
Pretrained Language Model based Web Search Ranking: From Relevance to Satisfaction
Search engine plays a crucial role in satisfying users' diverse information
needs. Recently, Pretrained Language Models (PLMs) based text ranking models
have achieved huge success in web search. However, many state-of-the-art text
ranking approaches only focus on core relevance while ignoring other dimensions
that contribute to user satisfaction, e.g., document quality, recency,
authority, etc. In this work, we focus on ranking user satisfaction rather than
relevance in web search, and propose a PLM-based framework, namely SAT-Ranker,
which comprehensively models different dimensions of user satisfaction in a
unified manner. In particular, we leverage the capacities of PLMs on both
textual and numerical inputs, and apply a multi-field input that modularizes
each dimension of user satisfaction as an input field. Overall, SAT-Ranker is
an effective, extensible, and data-centric framework that has huge potential
for industrial applications. On rigorous offline and online experiments,
SAT-Ranker obtains remarkable gains on various evaluation sets targeting
different dimensions of user satisfaction. It is now fully deployed online to
improve the usability of our search engine
A Feedback-Based Approach to Utilizing Embeddings for Clinical Decision Support
Abstract Clinical Decision Support (CDS) is widely seen as an information retrieval (IR) application in the medical domain. The goal of CDS is to help physicians find useful information from a collection of medical articles with respect to the given patient records, in order to take the best care of their patients. Most of the existing CDS methods do not sufficiently consider the semantic relation between texts, hence the potential in improving the performance in biomedical articles retrieval. This paper proposes a novel feedback-based approach which considers the semantic association between a retrieved biomedical article and a pseudo feedback set. Evaluation results show that our method outperforms the strong baselines and is able to improve over the best runs in the TREC CDS tasks
PARADE: passage representation aggregation for document reranking
Pre-trained transformer models, such as BERT and T5, have shown to be highly effective at ad-hoc passage and document ranking. Due to the inherent sequence length limits of these models, they need to process document passages one at a time rather than processing the entire document sequence at once. Although several approaches for aggregating passage-level signals into a document-level relevance score have been proposed, there has yet to be an extensive comparison of these techniques. In this work, we explore strategies for aggregating relevance signals from a document’s passages into a final ranking score. We find that passage representation aggregation techniques can significantly improve over score aggregation techniques proposed in prior work, such as taking the maximum passage score. We call this new approach PARADE. In particular, PARADE can significantly improve results on collections with broad information needs where relevance signals can be spread throughout the document (such as TREC Robust04 and GOV2). Meanwhile, less complex aggregation techniques may work better on collections with an information need that can often be pinpointed to a single passage (such as TREC DL and TREC Genomics). We also conduct efficiency analyses and highlight several strategies for improving transformer-based aggregation