8,714 research outputs found
Unifying Online and Counterfactual Learning to Rank
Optimizing ranking systems based on user interactions is a well-studied
problem. State-of-the-art methods for optimizing ranking systems based on user
interactions are divided into online approaches - that learn by directly
interacting with users - and counterfactual approaches - that learn from
historical interactions. Existing online methods are hindered without online
interventions and thus should not be applied counterfactually. Conversely,
counterfactual methods cannot directly benefit from online interventions. We
propose a novel intervention-aware estimator for both counterfactual and online
Learning to Rank (LTR). With the introduction of the intervention-aware
estimator, we aim to bridge the online/counterfactual LTR division as it is
shown to be highly effective in both online and counterfactual scenarios. The
estimator corrects for the effect of position bias, trust bias, and
item-selection bias by using corrections based on the behavior of the logging
policy and on online interventions: changes to the logging policy made during
the gathering of click data. Our experimental results, conducted in a
semi-synthetic experimental setup, show that, unlike existing counterfactual
LTR methods, the intervention-aware estimator can greatly benefit from online
interventions.Comment: Harrie Oosterhuis and Maarten de Rijke. 2021. Unifying Online and
Counterfactual Learning to Rank: A Novel Counterfactual Estimator that
Effectively Utilizes Online Interventions. In The 14th ACM International
Conference on Web Search and Data Mining (WSDM '21), March 8-12, 2021,
Jerusalem, Israel. ACM, New York, NY, USA, 9 pages.
https://doi.org/10.1145/3437963.344179
Unbiased Learning to Rank: Counterfactual and Online Approaches
This tutorial covers and contrasts the two main methodologies in unbiased
Learning to Rank (LTR): Counterfactual LTR and Online LTR. There has long been
an interest in LTR from user interactions, however, this form of implicit
feedback is very biased. In recent years, unbiased LTR methods have been
introduced to remove the effect of different types of bias caused by
user-behavior in search. For instance, a well addressed type of bias is
position bias: the rank at which a document is displayed heavily affects the
interactions it receives. Counterfactual LTR methods deal with such types of
bias by learning from historical interactions while correcting for the effect
of the explicitly modelled biases. Online LTR does not use an explicit user
model, in contrast, it learns through an interactive process where randomized
results are displayed to the user. Through randomization the effect of
different types of bias can be removed from the learning process. Though both
methodologies lead to unbiased LTR, their approaches differ considerably,
furthermore, so do their theoretical guarantees, empirical results, effects on
the user experience during learning, and applicability. Consequently, for
practitioners the choice between the two is very substantial. By providing an
overview of both approaches and contrasting them, we aim to provide an
essential guide to unbiased LTR so as to aid in understanding and choosing
between methodologies.Comment: Abstract for tutorial appearing at SIGIR 201
Learning from User Interactions with Rankings: A Unification of the Field
Ranking systems form the basis for online search engines and recommendation
services. They process large collections of items, for instance web pages or
e-commerce products, and present the user with a small ordered selection. The
goal of a ranking system is to help a user find the items they are looking for
with the least amount of effort. Thus the rankings they produce should place
the most relevant or preferred items at the top of the ranking. Learning to
rank is a field within machine learning that covers methods which optimize
ranking systems w.r.t. this goal. Traditional supervised learning to rank
methods utilize expert-judgements to evaluate and learn, however, in many
situations such judgements are impossible or infeasible to obtain. As a
solution, methods have been introduced that perform learning to rank based on
user clicks instead. The difficulty with clicks is that they are not only
affected by user preferences, but also by what rankings were displayed.
Therefore, these methods have to prevent being biased by other factors than
user preference. This thesis concerns learning to rank methods based on user
clicks and specifically aims to unify the different families of these methods.
As a whole, the second part of this thesis proposes a framework that bridges
many gaps between areas of online, counterfactual, and supervised learning to
rank. It has taken approaches, previously considered independent, and unified
them into a single methodology for widely applicable and effective learning to
rank from user clicks.Comment: PhD Thesis of Harrie Oosterhuis defended at the University of
Amsterdam on November 27th 202
To Model or to Intervene: A Comparison of Counterfactual and Online Learning to Rank from User Interactions
Learning to Rank (LTR) from user interactions is challenging as user feedback
often contains high levels of bias and noise. At the moment, two methodologies
for dealing with bias prevail in the field of LTR: counterfactual methods that
learn from historical data and model user behavior to deal with biases; and
online methods that perform interventions to deal with bias but use no explicit
user models. For practitioners the decision between either methodology is very
important because of its direct impact on end users. Nevertheless, there has
never been a direct comparison between these two approaches to unbiased LTR. In
this study we provide the first benchmarking of both counterfactual and online
LTR methods under different experimental conditions. Our results show that the
choice between the methodologies is consequential and depends on the presence
of selection bias, and the degree of position bias and interaction noise. In
settings with little bias or noise counterfactual methods can obtain the
highest ranking performance; however, in other circumstances their optimization
can be detrimental to the user experience. Conversely, online methods are very
robust to bias and noise but require control over the displayed rankings. Our
findings confirm and contradict existing expectations on the impact of
model-based and intervention-based methods in LTR, and allow practitioners to
make an informed decision between the two methodologies.Comment: SIGIR 201
Policy-Aware Unbiased Learning to Rank for Top-k Rankings
Counterfactual Learning to Rank (LTR) methods optimize ranking systems using
logged user interactions that contain interaction biases. Existing methods are
only unbiased if users are presented with all relevant items in every ranking.
There is currently no existing counterfactual unbiased LTR method for top-k
rankings. We introduce a novel policy-aware counterfactual estimator for LTR
metrics that can account for the effect of a stochastic logging policy. We
prove that the policy-aware estimator is unbiased if every relevant item has a
non-zero probability to appear in the top-k ranking. Our experimental results
show that the performance of our estimator is not affected by the size of k:
for any k, the policy-aware estimator reaches the same retrieval performance
while learning from top-k feedback as when learning from feedback on the full
ranking. Lastly, we introduce novel extensions of traditional LTR methods to
perform counterfactual LTR and to optimize top-k metrics. Together, our
contributions introduce the first policy-aware unbiased LTR approach that
learns from top-k feedback and optimizes top-k metrics. As a result,
counterfactual LTR is now applicable to the very prevalent top-k ranking
setting in search and recommendation.Comment: SIGIR 2020 full conference pape
Accelerated Convergence for Counterfactual Learning to Rank
Counterfactual Learning to Rank (LTR) algorithms learn a ranking model from
logged user interactions, often collected using a production system. Employing
such an offline learning approach has many benefits compared to an online one,
but it is challenging as user feedback often contains high levels of bias.
Unbiased LTR uses Inverse Propensity Scoring (IPS) to enable unbiased learning
from logged user interactions. One of the major difficulties in applying
Stochastic Gradient Descent (SGD) approaches to counterfactual learning
problems is the large variance introduced by the propensity weights. In this
paper we show that the convergence rate of SGD approaches with IPS-weighted
gradients suffers from the large variance introduced by the IPS weights:
convergence is slow, especially when there are large IPS weights. To overcome
this limitation, we propose a novel learning algorithm, called CounterSample,
that has provably better convergence than standard IPS-weighted gradient
descent methods. We prove that CounterSample converges faster and complement
our theoretical findings with empirical results by performing extensive
experimentation in a number of biased LTR scenarios -- across optimizers, batch
sizes, and different degrees of position bias.Comment: SIGIR 2020 full conference pape
Cascade Model-based Propensity Estimation for Counterfactual Learning to Rank
Unbiased CLTR requires click propensities to compensate for the difference
between user clicks and true relevance of search results via IPS. Current
propensity estimation methods assume that user click behavior follows the PBM
and estimate click propensities based on this assumption. However, in reality,
user clicks often follow the CM, where users scan search results from top to
bottom and where each next click depends on the previous one. In this cascade
scenario, PBM-based estimates of propensities are not accurate, which, in turn,
hurts CLTR performance. In this paper, we propose a propensity estimation
method for the cascade scenario, called CM-IPS. We show that CM-IPS keeps CLTR
performance close to the full-information performance in case the user clicks
follow the CM, while PBM-based CLTR has a significant gap towards the
full-information. The opposite is true if the user clicks follow PBM instead of
the CM. Finally, we suggest a way to select between CM- and PBM-based
propensity estimation methods based on historical user clicks.Comment: 4 pages, 2 figures, 43rd International ACM SIGIR Conference on
Research and Development in Information Retrieval (SIGIR '20
- …