7,085 research outputs found
Policy-Aware Unbiased Learning to Rank for Top-k Rankings
Counterfactual Learning to Rank (LTR) methods optimize ranking systems using
logged user interactions that contain interaction biases. Existing methods are
only unbiased if users are presented with all relevant items in every ranking.
There is currently no existing counterfactual unbiased LTR method for top-k
rankings. We introduce a novel policy-aware counterfactual estimator for LTR
metrics that can account for the effect of a stochastic logging policy. We
prove that the policy-aware estimator is unbiased if every relevant item has a
non-zero probability to appear in the top-k ranking. Our experimental results
show that the performance of our estimator is not affected by the size of k:
for any k, the policy-aware estimator reaches the same retrieval performance
while learning from top-k feedback as when learning from feedback on the full
ranking. Lastly, we introduce novel extensions of traditional LTR methods to
perform counterfactual LTR and to optimize top-k metrics. Together, our
contributions introduce the first policy-aware unbiased LTR approach that
learns from top-k feedback and optimizes top-k metrics. As a result,
counterfactual LTR is now applicable to the very prevalent top-k ranking
setting in search and recommendation.Comment: SIGIR 2020 full conference pape
Learning from User Interactions with Rankings: A Unification of the Field
Ranking systems form the basis for online search engines and recommendation
services. They process large collections of items, for instance web pages or
e-commerce products, and present the user with a small ordered selection. The
goal of a ranking system is to help a user find the items they are looking for
with the least amount of effort. Thus the rankings they produce should place
the most relevant or preferred items at the top of the ranking. Learning to
rank is a field within machine learning that covers methods which optimize
ranking systems w.r.t. this goal. Traditional supervised learning to rank
methods utilize expert-judgements to evaluate and learn, however, in many
situations such judgements are impossible or infeasible to obtain. As a
solution, methods have been introduced that perform learning to rank based on
user clicks instead. The difficulty with clicks is that they are not only
affected by user preferences, but also by what rankings were displayed.
Therefore, these methods have to prevent being biased by other factors than
user preference. This thesis concerns learning to rank methods based on user
clicks and specifically aims to unify the different families of these methods.
As a whole, the second part of this thesis proposes a framework that bridges
many gaps between areas of online, counterfactual, and supervised learning to
rank. It has taken approaches, previously considered independent, and unified
them into a single methodology for widely applicable and effective learning to
rank from user clicks.Comment: PhD Thesis of Harrie Oosterhuis defended at the University of
Amsterdam on November 27th 202
Recent Advances in the Foundations and Applications of Unbiased Learning to Rank
Since its inception, the field of unbiased learning to rank (ULTR) has remained very active and has seen several impactful advancements in recent years. This tutorial provides both an introduction to the core concepts of the field and an overview of recent advancements in its foundations along with several applications of its methods.The tutorial is divided into four parts: Firstly, we give an overview of the different forms of bias that can be addressed with ULTR methods. Secondly, we present a comprehensive discussion of the latest estimation techniques in the ULTR field. Thirdly, we survey published results of ULTR in real-world applications. Fourthly, we discuss the connection between ULTR and fairness in ranking. We end by briefly reflecting on the future of ULTR research and its applications.This tutorial is intended to benefit both researchers and industry practitioners who are interested in developing new ULTR solutions or utilizing them in real-world applications
Unifying Online and Counterfactual Learning to Rank
Optimizing ranking systems based on user interactions is a well-studied
problem. State-of-the-art methods for optimizing ranking systems based on user
interactions are divided into online approaches - that learn by directly
interacting with users - and counterfactual approaches - that learn from
historical interactions. Existing online methods are hindered without online
interventions and thus should not be applied counterfactually. Conversely,
counterfactual methods cannot directly benefit from online interventions. We
propose a novel intervention-aware estimator for both counterfactual and online
Learning to Rank (LTR). With the introduction of the intervention-aware
estimator, we aim to bridge the online/counterfactual LTR division as it is
shown to be highly effective in both online and counterfactual scenarios. The
estimator corrects for the effect of position bias, trust bias, and
item-selection bias by using corrections based on the behavior of the logging
policy and on online interventions: changes to the logging policy made during
the gathering of click data. Our experimental results, conducted in a
semi-synthetic experimental setup, show that, unlike existing counterfactual
LTR methods, the intervention-aware estimator can greatly benefit from online
interventions.Comment: Harrie Oosterhuis and Maarten de Rijke. 2021. Unifying Online and
Counterfactual Learning to Rank: A Novel Counterfactual Estimator that
Effectively Utilizes Online Interventions. In The 14th ACM International
Conference on Web Search and Data Mining (WSDM '21), March 8-12, 2021,
Jerusalem, Israel. ACM, New York, NY, USA, 9 pages.
https://doi.org/10.1145/3437963.344179
An Offline Metric for the Debiasedness of Click Models
A well-known problem when learning from user clicks are inherent biases
prevalent in the data, such as position or trust bias. Click models are a
common method for extracting information from user clicks, such as document
relevance in web search, or to estimate click biases for downstream
applications such as counterfactual learning-to-rank, ad placement, or fair
ranking. Recent work shows that the current evaluation practices in the
community fail to guarantee that a well-performing click model generalizes well
to downstream tasks in which the ranking distribution differs from the training
distribution, i.e., under covariate shift. In this work, we propose an
evaluation metric based on conditional independence testing to detect a lack of
robustness to covariate shift in click models. We introduce the concept of
debiasedness and a metric for measuring it. We prove that debiasedness is a
necessary condition for recovering unbiased and consistent relevance scores and
for the invariance of click prediction under covariate shift. In extensive
semi-synthetic experiments, we show that our proposed metric helps to predict
the downstream performance of click models under covariate shift and is useful
in an off-policy model selection setting.Comment: SIGIR23 - Full pape
Equity of Attention: Amortizing Individual Fairness in Rankings
Rankings of people and items are at the heart of selection-making,
match-making, and recommender systems, ranging from employment sites to sharing
economy platforms. As ranking positions influence the amount of attention the
ranked subjects receive, biases in rankings can lead to unfair distribution of
opportunities and resources, such as jobs or income.
This paper proposes new measures and mechanisms to quantify and mitigate
unfairness from a bias inherent to all rankings, namely, the position bias,
which leads to disproportionately less attention being paid to low-ranked
subjects. Our approach differs from recent fair ranking approaches in two
important ways. First, existing works measure unfairness at the level of
subject groups while our measures capture unfairness at the level of individual
subjects, and as such subsume group unfairness. Second, as no single ranking
can achieve individual attention fairness, we propose a novel mechanism that
achieves amortized fairness, where attention accumulated across a series of
rankings is proportional to accumulated relevance.
We formulate the challenge of achieving amortized individual fairness subject
to constraints on ranking quality as an online optimization problem and show
that it can be solved as an integer linear program. Our experimental evaluation
reveals that unfair attention distribution in rankings can be substantial, and
demonstrates that our method can improve individual fairness while retaining
high ranking quality.Comment: Accepted to SIGIR 201
Safe Deployment for Counterfactual Learning to Rank with Exposure-Based Risk Minimization
Counterfactual learning to rank (CLTR) relies on exposure-based inverse
propensity scoring (IPS), a LTR-specific adaptation of IPS to correct for
position bias. While IPS can provide unbiased and consistent estimates, it
often suffers from high variance. Especially when little click data is
available, this variance can cause CLTR to learn sub-optimal ranking behavior.
Consequently, existing CLTR methods bring significant risks with them, as
naively deploying their models can result in very negative user experiences. We
introduce a novel risk-aware CLTR method with theoretical guarantees for safe
deployment. We apply a novel exposure-based concept of risk regularization to
IPS estimation for LTR. Our risk regularization penalizes the mismatch between
the ranking behavior of a learned model and a given safe model. Thereby, it
ensures that learned ranking models stay close to a trusted model, when there
is high uncertainty in IPS estimation, which greatly reduces the risks during
deployment. Our experimental results demonstrate the efficacy of our proposed
method, which is effective at avoiding initial periods of bad performance when
little data is available, while also maintaining high performance at
convergence. For the CLTR field, our novel exposure-based risk minimization
method enables practitioners to adopt CLTR methods in a safer manner that
mitigates many of the risks attached to previous methods.Comment: SIGIR 2023 - Full pape
- …