27 research outputs found
Context-Driven Interactive Query Simulations Based on Generative Large Language Models
Simulating user interactions enables a more user-oriented evaluation of
information retrieval (IR) systems. While user simulations are cost-efficient
and reproducible, many approaches often lack fidelity regarding real user
behavior. Most notably, current user models neglect the user's context, which
is the primary driver of perceived relevance and the interactions with the
search results. To this end, this work introduces the simulation of
context-driven query reformulations. The proposed query generation methods
build upon recent Large Language Model (LLM) approaches and consider the user's
context throughout the simulation of a search session. Compared to simple
context-free query generation approaches, these methods show better
effectiveness and allow the simulation of more efficient IR sessions.
Similarly, our evaluations consider more interaction context than current
session-based measures and reveal interesting complementary insights in
addition to the established evaluation protocols. We conclude with directions
for future work and provide an entirely open experimental setup.Comment: Accepted at ECIR 2024 (Full Paper
Category-Aware Location Embedding for Point-of-Interest Recommendation
Recently, Point of interest (POI) recommendation has gained ever-increasing
importance in various Location-Based Social Networks (LBSNs). With the recent
advances of neural models, much work has sought to leverage neural networks to
learn neural embeddings in a pre-training phase that achieve an improved
representation of POIs and consequently a better recommendation. However,
previous studies fail to capture crucial information about POIs such as
categorical information.
In this paper, we propose a novel neural model that generates a POI embedding
incorporating sequential and categorical information from POIs. Our model
consists of a check-in module and a category module. The check-in module
captures the geographical influence of POIs derived from the sequence of users'
check-ins, while the category module captures the characteristics of POIs
derived from the category information. To validate the efficacy of the model,
we experimented with two large-scale LBSN datasets. Our experimental results
demonstrate that our approach significantly outperforms state-of-the-art POI
recommendation methods.Comment: 4 pages, 1 figure
Neural Document Expansion with User Feedback
This paper presents a neural document expansion approach (NeuDEF) that
enriches document representations for neural ranking models. NeuDEF harvests
expansion terms from queries which lead to clicks on the document and weights
these expansion terms with learned attention. It is plugged into a standard
neural ranker and learned end-to-end. Experiments on a commercial search log
demonstrate that NeuDEF significantly improves the accuracy of state-of-the-art
neural rankers and expansion methods on queries with different frequencies.
Further studies show the contribution of click queries and learned expansion
weights, as well as the influence of document popularity of NeuDEF's
effectiveness.Comment: The 2019 ACM SIGIR International Conference on the Theory of
Information Retrieva
Session-level Normalization and Click-through Data Enhancement for Session-based Evaluation
Since a user usually has to issue a sequence of queries and examine multiple
documents to resolve a complex information need in a search session,
researchers have paid much attention to evaluating search systems at the
session level rather than the single-query level. Most existing session-level
metrics evaluate each query separately and then aggregate the query-level
scores using a session-level weighting function. The assumptions behind these
metrics are that all queries in the session should be involved, and their
orders are fixed. However, if a search system could make the user satisfied
with her first few queries, she may not need any subsequent queries. Besides,
in most real-world search scenarios, due to a lack of explicit feedback from
real users, we can only leverage some implicit feedback, such as users' clicks,
as relevance labels for offline evaluation. Such implicit feedback might be
different from the real relevance in a search session as some documents may be
omitted in the previous query but identified in the later reformulations. To
address the above issues, we make two assumptions about session-based
evaluation, which explicitly describe an ideal session-search system and how to
enhance click-through data in computing session-level evaluation metrics. Based
on our assumptions, we design a session-level metric called Normalized
U-Measure (NUM). NUM evaluates a session as a whole and utilizes an ideal
session to normalize the result of the actual session. Besides, it infers
session-level relevance labels based on implicit feedback. Experiments on two
public datasets demonstrate the effectiveness of NUM by comparing it with
existing session-based metrics in terms of correlation with user satisfaction
and intuitiveness. We also conduct ablation studies to explore whether these
assumptions hold
Diagnostic Evaluation of Policy-Gradient-Based Ranking
Learning-to-rank has been intensively studied and has shown significantly increasing values in a wide range of domains, such as web search, recommender systems, dialogue systems, machine translation, and even computational biology, to name a few. In light of recent advances in neural networks, there has been a strong and continuing interest in exploring how to deploy popular techniques, such as reinforcement learning and adversarial learning, to solve ranking problems. However, armed with the aforesaid popular techniques, most studies tend to show how effective a new method is. A comprehensive comparison between techniques and an in-depth analysis of their deficiencies are somehow overlooked. This paper is motivated by the observation that recent ranking methods based on either reinforcement learning or adversarial learning boil down to policy-gradient-based optimization. Based on the widely used benchmark collections with complete information (where relevance labels are known for all items), such as MSLRWEB30K and Yahoo-Set1, we thoroughly investigate the extent to which policy-gradient-based ranking methods are effective. On one hand, we analytically identify the pitfalls of policy-gradient-based ranking. On the other hand, we experimentally compare a wide range of representative methods. The experimental results echo our analysis and show that policy-gradient-based ranking methods are, by a large margin, inferior to many conventional ranking methods. Regardless of whether we use reinforcement learning or adversarial learning, the failures are largely attributable to the gradient estimation based on sampled rankings, which significantly diverge from ideal rankings. In particular, the larger the number of documents per query and the more fine-grained the ground-truth labels, the greater the impact policy-gradient-based ranking suffers. Careful examination of this weakness is highly recommended for developing enhanced methods based on policy gradient
Recommended from our members
Neural Models for Information Retrieval without Labeled Data
Recent developments of machine learning models, and in particular deep neural networks, have yielded significant improvements on several computer vision, natural language processing, and speech recognition tasks. Progress with information retrieval (IR) tasks has been slower, however, due to the lack of large-scale training data as well as neural network models specifically designed for effective information retrieval. In this dissertation, we address these two issues by introducing task-specific neural network architectures for a set of IR tasks and proposing novel unsupervised or \emph{weakly supervised} solutions for training the models. The proposed learning solutions do not require labeled training data. Instead, in our weak supervision approach, neural models are trained on a large set of noisy and biased training data obtained from external resources, existing models, or heuristics.
We first introduce relevance-based embedding models that learn distributed representations for words and queries. We show that the learned representations can be effectively employed for a set of IR tasks, including query expansion, pseudo-relevance feedback, and query classification.
We further propose a standalone learning to rank model based on deep neural networks. Our model learns a sparse representation for queries and documents. This enables us to perform efficient retrieval by constructing an inverted index in the learned semantic space. Our model outperforms state-of-the-art retrieval models, while performing as efficiently as term matching retrieval models.
We additionally propose a neural network framework for predicting the performance of a retrieval model for a given query. Inspired by existing query performance prediction models, our framework integrates several information sources, such as retrieval score distribution and term distribution in the top retrieved documents. This leads to state-of-the-art results for the performance prediction task on various standard collections.
We finally bridge the gap between retrieval and recommendation models, as the two key components in most information systems. Search and recommendation often share the same goal: helping people get the information they need at the right time. Therefore, joint modeling and optimization of search engines and recommender systems could potentially benefit both systems. In more detail, we introduce a retrieval model that is trained using user-item interaction (e.g., recommendation data), with no need to query-document relevance information for training.
Our solutions and findings in this dissertation smooth the path towards learning efficient and effective models for various information retrieval and related tasks, especially when large-scale training data is not available
Performance Prediction for Multi-hop Questions
We study the problem of Query Performance Prediction (QPP) for open-domain
multi-hop Question Answering (QA), where the task is to estimate the difficulty
of evaluating a multi-hop question over a corpus. Despite the extensive
research on predicting the performance of ad-hoc and QA retrieval models, there
has been a lack of study on the estimation of the difficulty of multi-hop
questions. The problem is challenging due to the multi-step nature of the
retrieval process, potential dependency of the steps and the reasoning
involved. To tackle this challenge, we propose multHP, a novel pre-retrieval
method for predicting the performance of open-domain multi-hop questions. Our
extensive evaluation on the largest multi-hop QA dataset using several modern
QA systems shows that the proposed model is a strong predictor of the
performance, outperforming traditional single-hop QPP models. Additionally, we
demonstrate that our approach can be effectively used to optimize the
parameters of QA systems, such as the number of documents to be retrieved,
resulting in improved overall retrieval performance.Comment: 10 page
Semantic Representations of Mathematical Expressions in a Continuous Vector Space
Mathematical notation makes up a large portion of STEM literature, yet,
finding semantic representations for formulae remains a challenging problem.
Because mathematical notation is precise, and its meaning changes significantly
with small character shifts, the methods that work for natural text do not
necessarily work well for mathematical expressions. In this work, we describe
an approach for representing mathematical expressions in a continuous vector
space. We use the encoder of a sequence-to-sequence architecture, trained on
visually different but mathematically equivalent expressions, to generate
vector representations (or embeddings). We compare this approach with an
autoencoder and show that the former is better at capturing mathematical
semantics. Finally, to expedite future research, we publish a corpus of
equivalent transcendental and algebraic expression pairs.Comment: 17 pages, 2 figure