303 research outputs found
Bayesian Ranker Comparison Based on Historical User Interactions
ABSTRACT We address the problem of how to safely compare rankers for information retrieval. In particular, we consider how to control the risks associated with switching from an existing production ranker to a new candidate ranker. Whereas existing online comparison methods require showing potentially suboptimal result lists to users during the comparison process, which can lead to user frustration and abandonment, our approach only requires user interaction data generated through the natural use of the production ranker. Specifically, we propose a Bayesian approach for (1) comparing the production ranker to candidate rankers and (2) estimating the confidence of this comparison. The comparison of rankers is performed using click model-based information retrieval metrics, while the confidence of the comparison is derived from Bayesian estimates of uncertainty in the underlying click model. These confidence estimates are then used to determine whether a risk-averse decision criterion for switching to the candidate ranker has been satisfied. Experimental results on several learning to rank datasets and on a click log show that the proposed approach outperforms an existing ranker comparison method that does not take uncertainty into account
Search engines that learn from their users
More than half the world's population uses web search engines, resulting in over half a billion queries every single day. For many people, web search engines such as Baidu, Bing, Google, and Yandex are among the first resources they go to when a question arises. Moreover, for many search engines have become the most trusted route to information, more so even than traditional media such as newspapers, news websites or news channels on television. What web search engines present people with greatly influences what they believe to be true and consequently it influences their thoughts, opinions, decisions, and the actions they take. With this in mind two things are important, from an information retrieval research perspective. First, it is important to understand how well search engines (rankers) perform and secondly this knowledge should be used to improve them. This thesis is about these two topics: evaluation of search engines and learning search engines.
In the first part of this thesis we investigate how user interactions with search engines can be used to evaluate search engines. In particular, we introduce a new online evaluation paradigm called multileaving that extends upon interleaving. With multileaving, many rankers can be compared at once by combining document lists from these rankers into a single result list and attributing user interactions with this list to the rankers. Then we investigate the relation between A/B testing and interleaved comparison methods. Both studies lead to much higher sensitivity of the evaluation methods, meaning that fewer user interactions are required to arrive at reliable conclusions. This has the important implication that fewer users need to be exposed to the results from possibly inferior search engines.
In the second part of this thesis we turn to online learning to rank. We learn from the evaluation methods introduced and extended upon in the first part. We learn the parameters of base rankers based on user interactions. Then we use the multileaving methods as feedback in our learning method, leading to much faster convergence than existing methods. Again, the important implication is that fewer users need to be exposed to possibly inferior search engines as they adapt more quickly to changes in user preferences. The last part of this thesis is of a different nature than the earlier two parts. As opposed to the earlier chapters, we no longer study algorithms. Progress in information retrieval research has always been driven by a combination of algorithms, shared resources, and evaluation.
In the last part we focus on the latter two. We introduce a new shared resource and a new evaluation paradigm. Firstly, we propose Lerot. Lerot is an online evaluation framework that allows us to simulate users interacting with a search engine. Our implementation has been released as open source software and is currently being used by researchers around the world. Secondly we introduce OpenSearch, a new evaluation paradigm involving real users of real search engines. We describe an implementation of this paradigm that has already been widely adopted by the research community through challenges at CLEF and TREC.</jats:p
Efficient Exploration of Gradient Space for Online Learning to Rank
Online learning to rank (OL2R) optimizes the utility of returned search
results based on implicit feedback gathered directly from users. To improve the
estimates, OL2R algorithms examine one or more exploratory gradient directions
and update the current ranker if a proposed one is preferred by users via an
interleaved test. In this paper, we accelerate the online learning process by
efficient exploration in the gradient space. Our algorithm, named as Null Space
Gradient Descent, reduces the exploration space to only the \emph{null space}
of recent poorly performing gradients. This prevents the algorithm from
repeatedly exploring directions that have been discouraged by the most recent
interactions with users. To improve sensitivity of the resulting interleaved
test, we selectively construct candidate rankers to maximize the chance that
they can be differentiated by candidate ranking documents in the current query;
and we use historically difficult queries to identify the best ranker when tie
occurs in comparing the rankers. Extensive experimental comparisons with the
state-of-the-art OL2R algorithms on several public benchmarks confirmed the
effectiveness of our proposal algorithm, especially in its fast learning
convergence and promising ranking quality at an early stage.Comment: To appear on SIGIR '18: The 41st International ACM SIGIR Conference
on Research & Development in Information Retrieva
Cognitive Personalized Search Integrating Large Language Models with an Efficient Memory Mechanism
Traditional search engines usually provide identical search results for all
users, overlooking individual preferences. To counter this limitation,
personalized search has been developed to re-rank results based on user
preferences derived from query logs. Deep learning-based personalized search
methods have shown promise, but they rely heavily on abundant training data,
making them susceptible to data sparsity challenges. This paper proposes a
Cognitive Personalized Search (CoPS) model, which integrates Large Language
Models (LLMs) with a cognitive memory mechanism inspired by human cognition.
CoPS employs LLMs to enhance user modeling and user search experience. The
cognitive memory mechanism comprises sensory memory for quick sensory
responses, working memory for sophisticated cognitive responses, and long-term
memory for storing historical interactions. CoPS handles new queries using a
three-step approach: identifying re-finding behaviors, constructing user
profiles with relevant historical information, and ranking documents based on
personalized query intent. Experiments show that CoPS outperforms baseline
models in zero-shot scenarios.Comment: Accepted by WWW 202
Learning Colour Representations of Search Queries
Image search engines rely on appropriately designed ranking features that
capture various aspects of the content semantics as well as the historic
popularity. In this work, we consider the role of colour in this relevance
matching process. Our work is motivated by the observation that a significant
fraction of user queries have an inherent colour associated with them. While
some queries contain explicit colour mentions (such as 'black car' and 'yellow
daisies'), other queries have implicit notions of colour (such as 'sky' and
'grass'). Furthermore, grounding queries in colour is not a mapping to a single
colour, but a distribution in colour space. For instance, a search for 'trees'
tends to have a bimodal distribution around the colours green and brown. We
leverage historical clickthrough data to produce a colour representation for
search queries and propose a recurrent neural network architecture to encode
unseen queries into colour space. We also show how this embedding can be learnt
alongside a cross-modal relevance ranker from impression logs where a subset of
the result images were clicked. We demonstrate that the use of a query-image
colour distance feature leads to an improvement in the ranker performance as
measured by users' preferences of clicked versus skipped images.Comment: Accepted as a full paper at SIGIR 202
- …