8,673 research outputs found
Policy-Aware Unbiased Learning to Rank for Top-k Rankings
Counterfactual Learning to Rank (LTR) methods optimize ranking systems using
logged user interactions that contain interaction biases. Existing methods are
only unbiased if users are presented with all relevant items in every ranking.
There is currently no existing counterfactual unbiased LTR method for top-k
rankings. We introduce a novel policy-aware counterfactual estimator for LTR
metrics that can account for the effect of a stochastic logging policy. We
prove that the policy-aware estimator is unbiased if every relevant item has a
non-zero probability to appear in the top-k ranking. Our experimental results
show that the performance of our estimator is not affected by the size of k:
for any k, the policy-aware estimator reaches the same retrieval performance
while learning from top-k feedback as when learning from feedback on the full
ranking. Lastly, we introduce novel extensions of traditional LTR methods to
perform counterfactual LTR and to optimize top-k metrics. Together, our
contributions introduce the first policy-aware unbiased LTR approach that
learns from top-k feedback and optimizes top-k metrics. As a result,
counterfactual LTR is now applicable to the very prevalent top-k ranking
setting in search and recommendation.Comment: SIGIR 2020 full conference pape
Differentiable Unbiased Online Learning to Rank
Online Learning to Rank (OLTR) methods optimize rankers based on user
interactions. State-of-the-art OLTR methods are built specifically for linear
models. Their approaches do not extend well to non-linear models such as neural
networks. We introduce an entirely novel approach to OLTR that constructs a
weighted differentiable pairwise loss after each interaction: Pairwise
Differentiable Gradient Descent (PDGD). PDGD breaks away from the traditional
approach that relies on interleaving or multileaving and extensive sampling of
models to estimate gradients. Instead, its gradient is based on inferring
preferences between document pairs from user clicks and can optimize any
differentiable model. We prove that the gradient of PDGD is unbiased w.r.t.
user document pair preferences. Our experiments on the largest publicly
available Learning to Rank (LTR) datasets show considerable and significant
improvements under all levels of interaction noise. PDGD outperforms existing
OLTR methods both in terms of learning speed as well as final convergence.
Furthermore, unlike previous OLTR methods, PDGD also allows for non-linear
models to be optimized effectively. Our results show that using a neural
network leads to even better performance at convergence than a linear model. In
summary, PDGD is an efficient and unbiased OLTR approach that provides a better
user experience than previously possible.Comment: Conference on Information and Knowledge Management 201
Optimizing Ranking Models in an Online Setting
Online Learning to Rank (OLTR) methods optimize ranking models by directly
interacting with users, which allows them to be very efficient and responsive.
All OLTR methods introduced during the past decade have extended on the
original OLTR method: Dueling Bandit Gradient Descent (DBGD). Recently, a
fundamentally different approach was introduced with the Pairwise
Differentiable Gradient Descent (PDGD) algorithm. To date the only comparisons
of the two approaches are limited to simulations with cascading click models
and low levels of noise. The main outcome so far is that PDGD converges at
higher levels of performance and learns considerably faster than DBGD-based
methods. However, the PDGD algorithm assumes cascading user behavior,
potentially giving it an unfair advantage. Furthermore, the robustness of both
methods to high levels of noise has not been investigated. Therefore, it is
unclear whether the reported advantages of PDGD over DBGD generalize to
different experimental conditions. In this paper, we investigate whether the
previous conclusions about the PDGD and DBGD comparison generalize from ideal
to worst-case circumstances. We do so in two ways. First, we compare the
theoretical properties of PDGD and DBGD, by taking a critical look at
previously proven properties in the context of ranking. Second, we estimate an
upper and lower bound on the performance of methods by simulating both ideal
user behavior and extremely difficult behavior, i.e., almost-random
non-cascading user models. Our findings show that the theoretical bounds of
DBGD do not apply to any common ranking model and, furthermore, that the
performance of DBGD is substantially worse than PDGD in both ideal and
worst-case circumstances. These results reproduce previously published findings
about the relative performance of PDGD vs. DBGD and generalize them to
extremely noisy and non-cascading circumstances.Comment: European Conference on Information Retrieval (ECIR) 201
Unbiased Learning to Rank: Counterfactual and Online Approaches
This tutorial covers and contrasts the two main methodologies in unbiased
Learning to Rank (LTR): Counterfactual LTR and Online LTR. There has long been
an interest in LTR from user interactions, however, this form of implicit
feedback is very biased. In recent years, unbiased LTR methods have been
introduced to remove the effect of different types of bias caused by
user-behavior in search. For instance, a well addressed type of bias is
position bias: the rank at which a document is displayed heavily affects the
interactions it receives. Counterfactual LTR methods deal with such types of
bias by learning from historical interactions while correcting for the effect
of the explicitly modelled biases. Online LTR does not use an explicit user
model, in contrast, it learns through an interactive process where randomized
results are displayed to the user. Through randomization the effect of
different types of bias can be removed from the learning process. Though both
methodologies lead to unbiased LTR, their approaches differ considerably,
furthermore, so do their theoretical guarantees, empirical results, effects on
the user experience during learning, and applicability. Consequently, for
practitioners the choice between the two is very substantial. By providing an
overview of both approaches and contrasting them, we aim to provide an
essential guide to unbiased LTR so as to aid in understanding and choosing
between methodologies.Comment: Abstract for tutorial appearing at SIGIR 201
- …