141 research outputs found
A Weighted Correlation Index for Rankings with Ties
Understanding the correlation between two different scores for the same set
of items is a common problem in information retrieval, and the most commonly
used statistics that quantifies this correlation is Kendall's . However,
the standard definition fails to capture that discordances between items with
high rank are more important than those between items with low rank. Recently,
a new measure of correlation based on average precision has been proposed to
solve this problem, but like many alternative proposals in the literature it
assumes that there are no ties in the scores. This is a major deficiency in a
number of contexts, and in particular while comparing centrality scores on
large graphs, as the obvious baseline, indegree, has a very large number of
ties in web and social graphs. We propose to extend Kendall's definition in a
natural way to take into account weights in the presence of ties. We prove a
number of interesting mathematical properties of our generalization and
describe an algorithm for its computation. We also validate the
usefulness of our weighted measure of correlation using experimental data
Query Expansion with Locally-Trained Word Embeddings
Continuous space word embeddings have received a great deal of attention in
the natural language processing and machine learning communities for their
ability to model term similarity and other relationships. We study the use of
term relatedness in the context of query expansion for ad hoc information
retrieval. We demonstrate that word embeddings such as word2vec and GloVe, when
trained globally, underperform corpus and query specific embeddings for
retrieval tasks. These results suggest that other tasks benefiting from global
embeddings may also benefit from local embeddings
Towards Group-aware Search Success
Traditional measures of search success often overlook the varying information
needs of different demographic groups. To address this gap, we introduce a
novel metric, named Group-aware Search Success (GA-SS). GA-SS redefines search
success to ensure that all demographic groups achieve satisfaction from search
outcomes. We introduce a comprehensive mathematical framework to calculate
GA-SS, incorporating both static and stochastic ranking policies and
integrating user browsing models for a more accurate assessment. In addition,
we have proposed Group-aware Most Popular Completion (gMPC) ranking model to
account for demographic variances in user intent, aligning more closely with
the diverse needs of all user groups. We empirically validate our metric and
approach with two real-world datasets: one focusing on query auto-completion
and the other on movie recommendations, where the results highlight the impact
of stochasticity and the complex interplay among various search success
metrics. Our findings advocate for a more inclusive approach in measuring
search success, as well as inspiring future investigations into the quality of
service of search
Training Curricula for Open Domain Answer Re-Ranking
In precision-oriented tasks like answer ranking, it is more important to rank
many relevant answers highly than to retrieve all relevant answers. It follows
that a good ranking strategy would be to learn how to identify the easiest
correct answers first (i.e., assign a high ranking score to answers that have
characteristics that usually indicate relevance, and a low ranking score to
those with characteristics that do not), before incorporating more complex
logic to handle difficult cases (e.g., semantic matching or reasoning). In this
work, we apply this idea to the training of neural answer rankers using
curriculum learning. We propose several heuristics to estimate the difficulty
of a given training sample. We show that the proposed heuristics can be used to
build a training curriculum that down-weights difficult samples early in the
training process. As the training process progresses, our approach gradually
shifts to weighting all samples equally, regardless of difficulty. We present a
comprehensive evaluation of our proposed idea on three answer ranking datasets.
Results show that our approach leads to superior performance of two leading
neural ranking architectures, namely BERT and ConvKNRM, using both pointwise
and pairwise losses. When applied to a BERT-based ranker, our method yields up
to a 4% improvement in MRR and a 9% improvement in P@1 (compared to the model
trained without a curriculum). This results in models that can achieve
comparable performance to more expensive state-of-the-art techniques.Comment: Accepted at SIGIR 2020 (long
- …