36 research outputs found
Sensitive and Scalable Online Evaluation with Theoretical Guarantees
Multileaved comparison methods generalize interleaved comparison methods to
provide a scalable approach for comparing ranking systems based on regular user
interactions. Such methods enable the increasingly rapid research and
development of search engines. However, existing multileaved comparison methods
that provide reliable outcomes do so by degrading the user experience during
evaluation. Conversely, current multileaved comparison methods that maintain
the user experience cannot guarantee correctness. Our contribution is two-fold.
First, we propose a theoretical framework for systematically comparing
multileaved comparison methods using the notions of considerateness, which
concerns maintaining the user experience, and fidelity, which concerns reliable
correct outcomes. Second, we introduce a novel multileaved comparison method,
Pairwise Preference Multileaving (PPM), that performs comparisons based on
document-pair preferences, and prove that it is considerate and has fidelity.
We show empirically that, compared to previous multileaved comparison methods,
PPM is more sensitive to user preferences and scalable with the number of
rankers being compared.Comment: CIKM 2017, Proceedings of the 2017 ACM on Conference on Information
and Knowledge Managemen
Learning from User Interactions with Rankings: A Unification of the Field
Ranking systems form the basis for online search engines and recommendation
services. They process large collections of items, for instance web pages or
e-commerce products, and present the user with a small ordered selection. The
goal of a ranking system is to help a user find the items they are looking for
with the least amount of effort. Thus the rankings they produce should place
the most relevant or preferred items at the top of the ranking. Learning to
rank is a field within machine learning that covers methods which optimize
ranking systems w.r.t. this goal. Traditional supervised learning to rank
methods utilize expert-judgements to evaluate and learn, however, in many
situations such judgements are impossible or infeasible to obtain. As a
solution, methods have been introduced that perform learning to rank based on
user clicks instead. The difficulty with clicks is that they are not only
affected by user preferences, but also by what rankings were displayed.
Therefore, these methods have to prevent being biased by other factors than
user preference. This thesis concerns learning to rank methods based on user
clicks and specifically aims to unify the different families of these methods.
As a whole, the second part of this thesis proposes a framework that bridges
many gaps between areas of online, counterfactual, and supervised learning to
rank. It has taken approaches, previously considered independent, and unified
them into a single methodology for widely applicable and effective learning to
rank from user clicks.Comment: PhD Thesis of Harrie Oosterhuis defended at the University of
Amsterdam on November 27th 202
Search engines that learn from their users
More than half the world's population uses web search engines, resulting in over half a billion queries every single day. For many people, web search engines such as Baidu, Bing, Google, and Yandex are among the first resources they go to when a question arises. Moreover, for many search engines have become the most trusted route to information, more so even than traditional media such as newspapers, news websites or news channels on television. What web search engines present people with greatly influences what they believe to be true and consequently it influences their thoughts, opinions, decisions, and the actions they take. With this in mind two things are important, from an information retrieval research perspective. First, it is important to understand how well search engines (rankers) perform and secondly this knowledge should be used to improve them. This thesis is about these two topics: evaluation of search engines and learning search engines.
In the first part of this thesis we investigate how user interactions with search engines can be used to evaluate search engines. In particular, we introduce a new online evaluation paradigm called multileaving that extends upon interleaving. With multileaving, many rankers can be compared at once by combining document lists from these rankers into a single result list and attributing user interactions with this list to the rankers. Then we investigate the relation between A/B testing and interleaved comparison methods. Both studies lead to much higher sensitivity of the evaluation methods, meaning that fewer user interactions are required to arrive at reliable conclusions. This has the important implication that fewer users need to be exposed to the results from possibly inferior search engines.
In the second part of this thesis we turn to online learning to rank. We learn from the evaluation methods introduced and extended upon in the first part. We learn the parameters of base rankers based on user interactions. Then we use the multileaving methods as feedback in our learning method, leading to much faster convergence than existing methods. Again, the important implication is that fewer users need to be exposed to possibly inferior search engines as they adapt more quickly to changes in user preferences. The last part of this thesis is of a different nature than the earlier two parts. As opposed to the earlier chapters, we no longer study algorithms. Progress in information retrieval research has always been driven by a combination of algorithms, shared resources, and evaluation.
In the last part we focus on the latter two. We introduce a new shared resource and a new evaluation paradigm. Firstly, we propose Lerot. Lerot is an online evaluation framework that allows us to simulate users interacting with a search engine. Our implementation has been released as open source software and is currently being used by researchers around the world. Secondly we introduce OpenSearch, a new evaluation paradigm involving real users of real search engines. We describe an implementation of this paradigm that has already been widely adopted by the research community through challenges at CLEF and TREC.</jats:p
Balancing Speed and Quality in Online Learning to Rank for Information Retrieval
In Online Learning to Rank (OLTR) the aim is to find an optimal ranking model
by interacting with users. When learning from user behavior, systems must
interact with users while simultaneously learning from those interactions.
Unlike other Learning to Rank (LTR) settings, existing research in this field
has been limited to linear models. This is due to the speed-quality tradeoff
that arises when selecting models: complex models are more expressive and can
find the best rankings but need more user interactions to do so, a requirement
that risks frustrating users during training. Conversely, simpler models can be
optimized on fewer interactions and thus provide a better user experience, but
they will converge towards suboptimal rankings. This tradeoff creates a
deadlock, since novel models will not be able to improve either the user
experience or the final convergence point, without sacrificing the other. Our
contribution is twofold. First, we introduce a fast OLTR model called Sim-MGD
that addresses the speed aspect of the speed-quality tradeoff. Sim-MGD ranks
documents based on similarities with reference documents. It converges rapidly
and, hence, gives a better user experience but it does not converge towards the
optimal rankings. Second, we contribute Cascading Multileave Gradient Descent
(C-MGD) for OLTR that directly addresses the speed-quality tradeoff by using a
cascade that enables combinations of the best of two worlds: fast learning and
high quality final convergence. C-MGD can provide the better user experience of
Sim-MGD while maintaining the same convergence as the state-of-the-art MGD
model. This opens the door for future work to design new models for OLTR
without having to deal with the speed-quality tradeoff.Comment: CIKM 2017, Proceedings of the 2017 ACM on Conference on Information
and Knowledge Managemen