22,203 research outputs found
Unbiased Learning to Rank: Counterfactual and Online Approaches
This tutorial covers and contrasts the two main methodologies in unbiased
Learning to Rank (LTR): Counterfactual LTR and Online LTR. There has long been
an interest in LTR from user interactions, however, this form of implicit
feedback is very biased. In recent years, unbiased LTR methods have been
introduced to remove the effect of different types of bias caused by
user-behavior in search. For instance, a well addressed type of bias is
position bias: the rank at which a document is displayed heavily affects the
interactions it receives. Counterfactual LTR methods deal with such types of
bias by learning from historical interactions while correcting for the effect
of the explicitly modelled biases. Online LTR does not use an explicit user
model, in contrast, it learns through an interactive process where randomized
results are displayed to the user. Through randomization the effect of
different types of bias can be removed from the learning process. Though both
methodologies lead to unbiased LTR, their approaches differ considerably,
furthermore, so do their theoretical guarantees, empirical results, effects on
the user experience during learning, and applicability. Consequently, for
practitioners the choice between the two is very substantial. By providing an
overview of both approaches and contrasting them, we aim to provide an
essential guide to unbiased LTR so as to aid in understanding and choosing
between methodologies.Comment: Abstract for tutorial appearing at SIGIR 201
Accelerated Convergence for Counterfactual Learning to Rank
Counterfactual Learning to Rank (LTR) algorithms learn a ranking model from
logged user interactions, often collected using a production system. Employing
such an offline learning approach has many benefits compared to an online one,
but it is challenging as user feedback often contains high levels of bias.
Unbiased LTR uses Inverse Propensity Scoring (IPS) to enable unbiased learning
from logged user interactions. One of the major difficulties in applying
Stochastic Gradient Descent (SGD) approaches to counterfactual learning
problems is the large variance introduced by the propensity weights. In this
paper we show that the convergence rate of SGD approaches with IPS-weighted
gradients suffers from the large variance introduced by the IPS weights:
convergence is slow, especially when there are large IPS weights. To overcome
this limitation, we propose a novel learning algorithm, called CounterSample,
that has provably better convergence than standard IPS-weighted gradient
descent methods. We prove that CounterSample converges faster and complement
our theoretical findings with empirical results by performing extensive
experimentation in a number of biased LTR scenarios -- across optimizers, batch
sizes, and different degrees of position bias.Comment: SIGIR 2020 full conference pape
Unbiased Learning to Rank with Unbiased Propensity Estimation
Learning to rank with biased click data is a well-known challenge. A variety
of methods has been explored to debias click data for learning to rank such as
click models, result interleaving and, more recently, the unbiased
learning-to-rank framework based on inverse propensity weighting. Despite their
differences, most existing studies separate the estimation of click bias
(namely the \textit{propensity model}) from the learning of ranking algorithms.
To estimate click propensities, they either conduct online result
randomization, which can negatively affect the user experience, or offline
parameter estimation, which has special requirements for click data and is
optimized for objectives (e.g. click likelihood) that are not directly related
to the ranking performance of the system. In this work, we address those
problems by unifying the learning of propensity models and ranking models. We
find that the problem of estimating a propensity model from click data is a
dual problem of unbiased learning to rank. Based on this observation, we
propose a Dual Learning Algorithm (DLA) that jointly learns an unbiased ranker
and an \textit{unbiased propensity model}. DLA is an automatic unbiased
learning-to-rank framework as it directly learns unbiased ranking models from
biased click data without any preprocessing. It can adapt to the change of bias
distributions and is applicable to online learning. Our empirical experiments
with synthetic and real-world data show that the models trained with DLA
significantly outperformed the unbiased learning-to-rank algorithms based on
result randomization and the models trained with relevance signals extracted by
click models
Controlling Fairness and Bias in Dynamic Learning-to-Rank
Rankings are the primary interface through which many online platforms match
users to items (e.g. news, products, music, video). In these two-sided markets,
not only the users draw utility from the rankings, but the rankings also
determine the utility (e.g. exposure, revenue) for the item providers (e.g.
publishers, sellers, artists, studios). It has already been noted that
myopically optimizing utility to the users, as done by virtually all
learning-to-rank algorithms, can be unfair to the item providers. We,
therefore, present a learning-to-rank approach for explicitly enforcing
merit-based fairness guarantees to groups of items (e.g. articles by the same
publisher, tracks by the same artist). In particular, we propose a learning
algorithm that ensures notions of amortized group fairness, while
simultaneously learning the ranking function from implicit feedback data. The
algorithm takes the form of a controller that integrates unbiased estimators
for both fairness and utility, dynamically adapting both as more data becomes
available. In addition to its rigorous theoretical foundation and convergence
guarantees, we find empirically that the algorithm is highly practical and
robust.Comment: First two authors contributed equally. In Proceedings of the 43rd
International ACM SIGIR Conference on Research and Development in Information
Retrieval 202
Estimating Position Bias without Intrusive Interventions
Presentation bias is one of the key challenges when learning from implicit
feedback in search engines, as it confounds the relevance signal. While it was
recently shown how counterfactual learning-to-rank (LTR) approaches
\cite{Joachims/etal/17a} can provably overcome presentation bias when
observation propensities are known, it remains to show how to effectively
estimate these propensities. In this paper, we propose the first method for
producing consistent propensity estimates without manual relevance judgments,
disruptive interventions, or restrictive relevance modeling assumptions. First,
we show how to harvest a specific type of intervention data from historic
feedback logs of multiple different ranking functions, and show that this data
is sufficient for consistent propensity estimation in the position-based model.
Second, we propose a new extremum estimator that makes effective use of this
data. In an empirical evaluation, we find that the new estimator provides
superior propensity estimates in two real-world systems -- Arxiv Full-text
Search and Google Drive Search. Beyond these two points, we find that the
method is robust to a wide range of settings in simulation studies
Policy-Aware Unbiased Learning to Rank for Top-k Rankings
Counterfactual Learning to Rank (LTR) methods optimize ranking systems using
logged user interactions that contain interaction biases. Existing methods are
only unbiased if users are presented with all relevant items in every ranking.
There is currently no existing counterfactual unbiased LTR method for top-k
rankings. We introduce a novel policy-aware counterfactual estimator for LTR
metrics that can account for the effect of a stochastic logging policy. We
prove that the policy-aware estimator is unbiased if every relevant item has a
non-zero probability to appear in the top-k ranking. Our experimental results
show that the performance of our estimator is not affected by the size of k:
for any k, the policy-aware estimator reaches the same retrieval performance
while learning from top-k feedback as when learning from feedback on the full
ranking. Lastly, we introduce novel extensions of traditional LTR methods to
perform counterfactual LTR and to optimize top-k metrics. Together, our
contributions introduce the first policy-aware unbiased LTR approach that
learns from top-k feedback and optimizes top-k metrics. As a result,
counterfactual LTR is now applicable to the very prevalent top-k ranking
setting in search and recommendation.Comment: SIGIR 2020 full conference pape
Cascade Model-based Propensity Estimation for Counterfactual Learning to Rank
Unbiased CLTR requires click propensities to compensate for the difference
between user clicks and true relevance of search results via IPS. Current
propensity estimation methods assume that user click behavior follows the PBM
and estimate click propensities based on this assumption. However, in reality,
user clicks often follow the CM, where users scan search results from top to
bottom and where each next click depends on the previous one. In this cascade
scenario, PBM-based estimates of propensities are not accurate, which, in turn,
hurts CLTR performance. In this paper, we propose a propensity estimation
method for the cascade scenario, called CM-IPS. We show that CM-IPS keeps CLTR
performance close to the full-information performance in case the user clicks
follow the CM, while PBM-based CLTR has a significant gap towards the
full-information. The opposite is true if the user clicks follow PBM instead of
the CM. Finally, we suggest a way to select between CM- and PBM-based
propensity estimation methods based on historical user clicks.Comment: 4 pages, 2 figures, 43rd International ACM SIGIR Conference on
Research and Development in Information Retrieval (SIGIR '20
- …