66,717 research outputs found
Unbiased Learning to Rank with Unbiased Propensity Estimation
Learning to rank with biased click data is a well-known challenge. A variety
of methods has been explored to debias click data for learning to rank such as
click models, result interleaving and, more recently, the unbiased
learning-to-rank framework based on inverse propensity weighting. Despite their
differences, most existing studies separate the estimation of click bias
(namely the \textit{propensity model}) from the learning of ranking algorithms.
To estimate click propensities, they either conduct online result
randomization, which can negatively affect the user experience, or offline
parameter estimation, which has special requirements for click data and is
optimized for objectives (e.g. click likelihood) that are not directly related
to the ranking performance of the system. In this work, we address those
problems by unifying the learning of propensity models and ranking models. We
find that the problem of estimating a propensity model from click data is a
dual problem of unbiased learning to rank. Based on this observation, we
propose a Dual Learning Algorithm (DLA) that jointly learns an unbiased ranker
and an \textit{unbiased propensity model}. DLA is an automatic unbiased
learning-to-rank framework as it directly learns unbiased ranking models from
biased click data without any preprocessing. It can adapt to the change of bias
distributions and is applicable to online learning. Our empirical experiments
with synthetic and real-world data show that the models trained with DLA
significantly outperformed the unbiased learning-to-rank algorithms based on
result randomization and the models trained with relevance signals extracted by
click models
Differentiable Unbiased Online Learning to Rank
Online Learning to Rank (OLTR) methods optimize rankers based on user
interactions. State-of-the-art OLTR methods are built specifically for linear
models. Their approaches do not extend well to non-linear models such as neural
networks. We introduce an entirely novel approach to OLTR that constructs a
weighted differentiable pairwise loss after each interaction: Pairwise
Differentiable Gradient Descent (PDGD). PDGD breaks away from the traditional
approach that relies on interleaving or multileaving and extensive sampling of
models to estimate gradients. Instead, its gradient is based on inferring
preferences between document pairs from user clicks and can optimize any
differentiable model. We prove that the gradient of PDGD is unbiased w.r.t.
user document pair preferences. Our experiments on the largest publicly
available Learning to Rank (LTR) datasets show considerable and significant
improvements under all levels of interaction noise. PDGD outperforms existing
OLTR methods both in terms of learning speed as well as final convergence.
Furthermore, unlike previous OLTR methods, PDGD also allows for non-linear
models to be optimized effectively. Our results show that using a neural
network leads to even better performance at convergence than a linear model. In
summary, PDGD is an efficient and unbiased OLTR approach that provides a better
user experience than previously possible.Comment: Conference on Information and Knowledge Management 201
Unconfounded Propensity Estimation for Unbiased Ranking
The goal of unbiased learning to rank (ULTR) is to leverage implicit user
feedback for optimizing learning-to-rank systems. Among existing solutions,
automatic ULTR algorithms that jointly learn user bias models (i.e., propensity
models) with unbiased rankers have received a lot of attention due to their
superior performance and low deployment cost in practice. Despite their
theoretical soundness, the effectiveness is usually justified under a weak
logging policy, where the ranking model can barely rank documents according to
their relevance to the query. However, when the logging policy is strong, e.g.,
an industry-deployed ranking policy, the reported effectiveness cannot be
reproduced. In this paper, we first investigate ULTR from a causal perspective
and uncover a negative result: existing ULTR algorithms fail to address the
issue of propensity overestimation caused by the query-document relevance
confounder. Then, we propose a new learning objective based on backdoor
adjustment and highlight its differences from conventional propensity models,
which reveal the prevalence of propensity overestimation. On top of that, we
introduce a novel propensity model called Logging-Policy-aware Propensity (LPP)
model and its distinctive two-step optimization strategy, which allows for the
joint learning of LPP and ranking models within the automatic ULTR framework,
and actualize the unconfounded propensity estimation for ULTR. Extensive
experiments on two benchmarks demonstrate the effectiveness and
generalizability of the proposed method.Comment: 11 pages, 5 figure
Feature-Enhanced Network with Hybrid Debiasing Strategies for Unbiased Learning to Rank
Unbiased learning to rank (ULTR) aims to mitigate various biases existing in
user clicks, such as position bias, trust bias, presentation bias, and learn an
effective ranker. In this paper, we introduce our winning approach for the
"Unbiased Learning to Rank" task in WSDM Cup 2023. We find that the provided
data is severely biased so neural models trained directly with the top 10
results with click information are unsatisfactory. So we extract multiple
heuristic-based features for multi-fields of the results, adjust the click
labels, add true negatives, and re-weight the samples during model training.
Since the propensities learned by existing ULTR methods are not decreasing
w.r.t. positions, we also calibrate the propensities according to the click
ratios and ensemble the models trained in two different ways. Our method won
the 3rd prize with a DCG@10 score of 9.80, which is 1.1% worse than the 2nd and
25.3% higher than the 4th.Comment: 5 pages, 1 figure, WSDM Cup 202
Policy-Aware Unbiased Learning to Rank for Top-k Rankings
Counterfactual Learning to Rank (LTR) methods optimize ranking systems using
logged user interactions that contain interaction biases. Existing methods are
only unbiased if users are presented with all relevant items in every ranking.
There is currently no existing counterfactual unbiased LTR method for top-k
rankings. We introduce a novel policy-aware counterfactual estimator for LTR
metrics that can account for the effect of a stochastic logging policy. We
prove that the policy-aware estimator is unbiased if every relevant item has a
non-zero probability to appear in the top-k ranking. Our experimental results
show that the performance of our estimator is not affected by the size of k:
for any k, the policy-aware estimator reaches the same retrieval performance
while learning from top-k feedback as when learning from feedback on the full
ranking. Lastly, we introduce novel extensions of traditional LTR methods to
perform counterfactual LTR and to optimize top-k metrics. Together, our
contributions introduce the first policy-aware unbiased LTR approach that
learns from top-k feedback and optimizes top-k metrics. As a result,
counterfactual LTR is now applicable to the very prevalent top-k ranking
setting in search and recommendation.Comment: SIGIR 2020 full conference pape
Controlling Fairness and Bias in Dynamic Learning-to-Rank
Rankings are the primary interface through which many online platforms match
users to items (e.g. news, products, music, video). In these two-sided markets,
not only the users draw utility from the rankings, but the rankings also
determine the utility (e.g. exposure, revenue) for the item providers (e.g.
publishers, sellers, artists, studios). It has already been noted that
myopically optimizing utility to the users, as done by virtually all
learning-to-rank algorithms, can be unfair to the item providers. We,
therefore, present a learning-to-rank approach for explicitly enforcing
merit-based fairness guarantees to groups of items (e.g. articles by the same
publisher, tracks by the same artist). In particular, we propose a learning
algorithm that ensures notions of amortized group fairness, while
simultaneously learning the ranking function from implicit feedback data. The
algorithm takes the form of a controller that integrates unbiased estimators
for both fairness and utility, dynamically adapting both as more data becomes
available. In addition to its rigorous theoretical foundation and convergence
guarantees, we find empirically that the algorithm is highly practical and
robust.Comment: First two authors contributed equally. In Proceedings of the 43rd
International ACM SIGIR Conference on Research and Development in Information
Retrieval 202
Cascade Model-based Propensity Estimation for Counterfactual Learning to Rank
Unbiased CLTR requires click propensities to compensate for the difference
between user clicks and true relevance of search results via IPS. Current
propensity estimation methods assume that user click behavior follows the PBM
and estimate click propensities based on this assumption. However, in reality,
user clicks often follow the CM, where users scan search results from top to
bottom and where each next click depends on the previous one. In this cascade
scenario, PBM-based estimates of propensities are not accurate, which, in turn,
hurts CLTR performance. In this paper, we propose a propensity estimation
method for the cascade scenario, called CM-IPS. We show that CM-IPS keeps CLTR
performance close to the full-information performance in case the user clicks
follow the CM, while PBM-based CLTR has a significant gap towards the
full-information. The opposite is true if the user clicks follow PBM instead of
the CM. Finally, we suggest a way to select between CM- and PBM-based
propensity estimation methods based on historical user clicks.Comment: 4 pages, 2 figures, 43rd International ACM SIGIR Conference on
Research and Development in Information Retrieval (SIGIR '20
Estimating Position Bias without Intrusive Interventions
Presentation bias is one of the key challenges when learning from implicit
feedback in search engines, as it confounds the relevance signal. While it was
recently shown how counterfactual learning-to-rank (LTR) approaches
\cite{Joachims/etal/17a} can provably overcome presentation bias when
observation propensities are known, it remains to show how to effectively
estimate these propensities. In this paper, we propose the first method for
producing consistent propensity estimates without manual relevance judgments,
disruptive interventions, or restrictive relevance modeling assumptions. First,
we show how to harvest a specific type of intervention data from historic
feedback logs of multiple different ranking functions, and show that this data
is sufficient for consistent propensity estimation in the position-based model.
Second, we propose a new extremum estimator that makes effective use of this
data. In an empirical evaluation, we find that the new estimator provides
superior propensity estimates in two real-world systems -- Arxiv Full-text
Search and Google Drive Search. Beyond these two points, we find that the
method is robust to a wide range of settings in simulation studies
- …