Search CORE

22,203 research outputs found

Unbiased Learning to Rank: Counterfactual and Online Approaches

Author: de Rijke Maarten
Jagerman Rolf
Oosterhuis Harrie
Publication venue
Publication date: 16/07/2019
Field of study

This tutorial covers and contrasts the two main methodologies in unbiased Learning to Rank (LTR): Counterfactual LTR and Online LTR. There has long been an interest in LTR from user interactions, however, this form of implicit feedback is very biased. In recent years, unbiased LTR methods have been introduced to remove the effect of different types of bias caused by user-behavior in search. For instance, a well addressed type of bias is position bias: the rank at which a document is displayed heavily affects the interactions it receives. Counterfactual LTR methods deal with such types of bias by learning from historical interactions while correcting for the effect of the explicitly modelled biases. Online LTR does not use an explicit user model, in contrast, it learns through an interactive process where randomized results are displayed to the user. Through randomization the effect of different types of bias can be removed from the learning process. Though both methodologies lead to unbiased LTR, their approaches differ considerably, furthermore, so do their theoretical guarantees, empirical results, effects on the user experience during learning, and applicability. Consequently, for practitioners the choice between the two is very substantial. By providing an overview of both approaches and contrasting them, we aim to provide an essential guide to unbiased LTR so as to aid in understanding and choosing between methodologies.Comment: Abstract for tutorial appearing at SIGIR 201

arXiv.org e-Print Archive

Crossref

International Migration, Integration and Social Cohesion online publications

UvA-DARE

Accelerated Convergence for Counterfactual Learning to Rank

Author: Chapelle Olivier
Duchi John
Hazan Elad
Kingma Diederik P
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2020
Field of study

Counterfactual Learning to Rank (LTR) algorithms learn a ranking model from logged user interactions, often collected using a production system. Employing such an offline learning approach has many benefits compared to an online one, but it is challenging as user feedback often contains high levels of bias. Unbiased LTR uses Inverse Propensity Scoring (IPS) to enable unbiased learning from logged user interactions. One of the major difficulties in applying Stochastic Gradient Descent (SGD) approaches to counterfactual learning problems is the large variance introduced by the propensity weights. In this paper we show that the convergence rate of SGD approaches with IPS-weighted gradients suffers from the large variance introduced by the IPS weights: convergence is slow, especially when there are large IPS weights. To overcome this limitation, we propose a novel learning algorithm, called CounterSample, that has provably better convergence than standard IPS-weighted gradient descent methods. We prove that CounterSample converges faster and complement our theoretical findings with empirical results by performing extensive experimentation in a number of biased LTR scenarios -- across optimizers, batch sizes, and different degrees of position bias.Comment: SIGIR 2020 full conference pape

arXiv.org e-Print Archive

Crossref

International Migration, Integration and Social Cohesion online publications

UvA-DARE

Unbiased Learning to Rank with Unbiased Propensity Estimation

Author: Ai Qingyao
Bi Keping
Croft W. Bruce
Guo Jiafeng
Luo Cheng
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 23/04/2018
Field of study

Learning to rank with biased click data is a well-known challenge. A variety of methods has been explored to debias click data for learning to rank such as click models, result interleaving and, more recently, the unbiased learning-to-rank framework based on inverse propensity weighting. Despite their differences, most existing studies separate the estimation of click bias (namely the \textit{propensity model}) from the learning of ranking algorithms. To estimate click propensities, they either conduct online result randomization, which can negatively affect the user experience, or offline parameter estimation, which has special requirements for click data and is optimized for objectives (e.g. click likelihood) that are not directly related to the ranking performance of the system. In this work, we address those problems by unifying the learning of propensity models and ranking models. We find that the problem of estimating a propensity model from click data is a dual problem of unbiased learning to rank. Based on this observation, we propose a Dual Learning Algorithm (DLA) that jointly learns an unbiased ranker and an \textit{unbiased propensity model}. DLA is an automatic unbiased learning-to-rank framework as it directly learns unbiased ranking models from biased click data without any preprocessing. It can adapt to the change of bias distributions and is applicable to online learning. Our empirical experiments with synthetic and real-world data show that the models trained with DLA significantly outperformed the unbiased learning-to-rank algorithms based on result randomization and the models trained with relevance signals extracted by click models

arXiv.org e-Print Archive

Crossref

Controlling Fairness and Bias in Dynamic Learning-to-Rank

Author: Abdollahpouri Himan
Ensign Danielle
Horvitz Daniel G
Joachims T.
Kingma Diederik P
Salganik M. J.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 29/05/2020
Field of study

Rankings are the primary interface through which many online platforms match users to items (e.g. news, products, music, video). In these two-sided markets, not only the users draw utility from the rankings, but the rankings also determine the utility (e.g. exposure, revenue) for the item providers (e.g. publishers, sellers, artists, studios). It has already been noted that myopically optimizing utility to the users, as done by virtually all learning-to-rank algorithms, can be unfair to the item providers. We, therefore, present a learning-to-rank approach for explicitly enforcing merit-based fairness guarantees to groups of items (e.g. articles by the same publisher, tracks by the same artist). In particular, we propose a learning algorithm that ensures notions of amortized group fairness, while simultaneously learning the ranking function from implicit feedback data. The algorithm takes the form of a controller that integrates unbiased estimators for both fairness and utility, dynamically adapting both as more data becomes available. In addition to its rigorous theoretical foundation and convergence guarantees, we find empirically that the algorithm is highly practical and robust.Comment: First two authors contributed equally. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval 202

arXiv.org e-Print Archive

Crossref

Estimating Position Bias without Intrusive Interventions

Author: Agarwal Aman
Joachims Thorsten
Miroslav Dud'i
O'Brien Maeve
Schnabel Tobias
Swaminathan Adith
Swaminathan Adith
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 12/12/2018
Field of study

Presentation bias is one of the key challenges when learning from implicit feedback in search engines, as it confounds the relevance signal. While it was recently shown how counterfactual learning-to-rank (LTR) approaches \cite{Joachims/etal/17a} can provably overcome presentation bias when observation propensities are known, it remains to show how to effectively estimate these propensities. In this paper, we propose the first method for producing consistent propensity estimates without manual relevance judgments, disruptive interventions, or restrictive relevance modeling assumptions. First, we show how to harvest a specific type of intervention data from historic feedback logs of multiple different ranking functions, and show that this data is sufficient for consistent propensity estimation in the position-based model. Second, we propose a new extremum estimator that makes effective use of this data. In an empirical evaluation, we find that the new estimator provides superior propensity estimates in two real-world systems -- Arxiv Full-text Search and Google Drive Search. Beyond these two points, we find that the method is robust to a wide range of settings in simulation studies

arXiv.org e-Print Archive

Crossref

Policy-Aware Unbiased Learning to Rank for Top-k Rankings

Author: de Rijke Maarten
Oosterhuis Harrie
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2020
Field of study

Counterfactual Learning to Rank (LTR) methods optimize ranking systems using logged user interactions that contain interaction biases. Existing methods are only unbiased if users are presented with all relevant items in every ranking. There is currently no existing counterfactual unbiased LTR method for top-k rankings. We introduce a novel policy-aware counterfactual estimator for LTR metrics that can account for the effect of a stochastic logging policy. We prove that the policy-aware estimator is unbiased if every relevant item has a non-zero probability to appear in the top-k ranking. Our experimental results show that the performance of our estimator is not affected by the size of k: for any k, the policy-aware estimator reaches the same retrieval performance while learning from top-k feedback as when learning from feedback on the full ranking. Lastly, we introduce novel extensions of traditional LTR methods to perform counterfactual LTR and to optimize top-k metrics. Together, our contributions introduce the first policy-aware unbiased LTR approach that learns from top-k feedback and optimizes top-k metrics. As a result, counterfactual LTR is now applicable to the very prevalent top-k ranking setting in search and recommendation.Comment: SIGIR 2020 full conference pape

arXiv.org e-Print Archive

International Migration, Integration and Social Cohesion online publications

UvA-DARE

Cascade Model-based Propensity Estimation for Counterfactual Learning to Rank

Author: de Rijke Maarten
Markov Ilya
Vardasbi Ali
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2020
Field of study

Unbiased CLTR requires click propensities to compensate for the difference between user clicks and true relevance of search results via IPS. Current propensity estimation methods assume that user click behavior follows the PBM and estimate click propensities based on this assumption. However, in reality, user clicks often follow the CM, where users scan search results from top to bottom and where each next click depends on the previous one. In this cascade scenario, PBM-based estimates of propensities are not accurate, which, in turn, hurts CLTR performance. In this paper, we propose a propensity estimation method for the cascade scenario, called CM-IPS. We show that CM-IPS keeps CLTR performance close to the full-information performance in case the user clicks follow the CM, while PBM-based CLTR has a significant gap towards the full-information. The opposite is true if the user clicks follow PBM instead of the CM. Finally, we suggest a way to select between CM- and PBM-based propensity estimation methods based on historical user clicks.Comment: 4 pages, 2 figures, 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '20

arXiv.org e-Print Archive

Crossref

UvA-DARE

International Migration, Integration and Social Cohesion online publications