143,362 research outputs found
Policy-Aware Unbiased Learning to Rank for Top-k Rankings
Counterfactual Learning to Rank (LTR) methods optimize ranking systems using
logged user interactions that contain interaction biases. Existing methods are
only unbiased if users are presented with all relevant items in every ranking.
There is currently no existing counterfactual unbiased LTR method for top-k
rankings. We introduce a novel policy-aware counterfactual estimator for LTR
metrics that can account for the effect of a stochastic logging policy. We
prove that the policy-aware estimator is unbiased if every relevant item has a
non-zero probability to appear in the top-k ranking. Our experimental results
show that the performance of our estimator is not affected by the size of k:
for any k, the policy-aware estimator reaches the same retrieval performance
while learning from top-k feedback as when learning from feedback on the full
ranking. Lastly, we introduce novel extensions of traditional LTR methods to
perform counterfactual LTR and to optimize top-k metrics. Together, our
contributions introduce the first policy-aware unbiased LTR approach that
learns from top-k feedback and optimizes top-k metrics. As a result,
counterfactual LTR is now applicable to the very prevalent top-k ranking
setting in search and recommendation.Comment: SIGIR 2020 full conference pape
Cascade Model-based Propensity Estimation for Counterfactual Learning to Rank
Unbiased CLTR requires click propensities to compensate for the difference
between user clicks and true relevance of search results via IPS. Current
propensity estimation methods assume that user click behavior follows the PBM
and estimate click propensities based on this assumption. However, in reality,
user clicks often follow the CM, where users scan search results from top to
bottom and where each next click depends on the previous one. In this cascade
scenario, PBM-based estimates of propensities are not accurate, which, in turn,
hurts CLTR performance. In this paper, we propose a propensity estimation
method for the cascade scenario, called CM-IPS. We show that CM-IPS keeps CLTR
performance close to the full-information performance in case the user clicks
follow the CM, while PBM-based CLTR has a significant gap towards the
full-information. The opposite is true if the user clicks follow PBM instead of
the CM. Finally, we suggest a way to select between CM- and PBM-based
propensity estimation methods based on historical user clicks.Comment: 4 pages, 2 figures, 43rd International ACM SIGIR Conference on
Research and Development in Information Retrieval (SIGIR '20
Unbiased Learning to Rank with Unbiased Propensity Estimation
Learning to rank with biased click data is a well-known challenge. A variety
of methods has been explored to debias click data for learning to rank such as
click models, result interleaving and, more recently, the unbiased
learning-to-rank framework based on inverse propensity weighting. Despite their
differences, most existing studies separate the estimation of click bias
(namely the \textit{propensity model}) from the learning of ranking algorithms.
To estimate click propensities, they either conduct online result
randomization, which can negatively affect the user experience, or offline
parameter estimation, which has special requirements for click data and is
optimized for objectives (e.g. click likelihood) that are not directly related
to the ranking performance of the system. In this work, we address those
problems by unifying the learning of propensity models and ranking models. We
find that the problem of estimating a propensity model from click data is a
dual problem of unbiased learning to rank. Based on this observation, we
propose a Dual Learning Algorithm (DLA) that jointly learns an unbiased ranker
and an \textit{unbiased propensity model}. DLA is an automatic unbiased
learning-to-rank framework as it directly learns unbiased ranking models from
biased click data without any preprocessing. It can adapt to the change of bias
distributions and is applicable to online learning. Our empirical experiments
with synthetic and real-world data show that the models trained with DLA
significantly outperformed the unbiased learning-to-rank algorithms based on
result randomization and the models trained with relevance signals extracted by
click models
Balancing Speed and Quality in Online Learning to Rank for Information Retrieval
In Online Learning to Rank (OLTR) the aim is to find an optimal ranking model
by interacting with users. When learning from user behavior, systems must
interact with users while simultaneously learning from those interactions.
Unlike other Learning to Rank (LTR) settings, existing research in this field
has been limited to linear models. This is due to the speed-quality tradeoff
that arises when selecting models: complex models are more expressive and can
find the best rankings but need more user interactions to do so, a requirement
that risks frustrating users during training. Conversely, simpler models can be
optimized on fewer interactions and thus provide a better user experience, but
they will converge towards suboptimal rankings. This tradeoff creates a
deadlock, since novel models will not be able to improve either the user
experience or the final convergence point, without sacrificing the other. Our
contribution is twofold. First, we introduce a fast OLTR model called Sim-MGD
that addresses the speed aspect of the speed-quality tradeoff. Sim-MGD ranks
documents based on similarities with reference documents. It converges rapidly
and, hence, gives a better user experience but it does not converge towards the
optimal rankings. Second, we contribute Cascading Multileave Gradient Descent
(C-MGD) for OLTR that directly addresses the speed-quality tradeoff by using a
cascade that enables combinations of the best of two worlds: fast learning and
high quality final convergence. C-MGD can provide the better user experience of
Sim-MGD while maintaining the same convergence as the state-of-the-art MGD
model. This opens the door for future work to design new models for OLTR
without having to deal with the speed-quality tradeoff.Comment: CIKM 2017, Proceedings of the 2017 ACM on Conference on Information
and Knowledge Managemen
Differentiable Unbiased Online Learning to Rank
Online Learning to Rank (OLTR) methods optimize rankers based on user
interactions. State-of-the-art OLTR methods are built specifically for linear
models. Their approaches do not extend well to non-linear models such as neural
networks. We introduce an entirely novel approach to OLTR that constructs a
weighted differentiable pairwise loss after each interaction: Pairwise
Differentiable Gradient Descent (PDGD). PDGD breaks away from the traditional
approach that relies on interleaving or multileaving and extensive sampling of
models to estimate gradients. Instead, its gradient is based on inferring
preferences between document pairs from user clicks and can optimize any
differentiable model. We prove that the gradient of PDGD is unbiased w.r.t.
user document pair preferences. Our experiments on the largest publicly
available Learning to Rank (LTR) datasets show considerable and significant
improvements under all levels of interaction noise. PDGD outperforms existing
OLTR methods both in terms of learning speed as well as final convergence.
Furthermore, unlike previous OLTR methods, PDGD also allows for non-linear
models to be optimized effectively. Our results show that using a neural
network leads to even better performance at convergence than a linear model. In
summary, PDGD is an efficient and unbiased OLTR approach that provides a better
user experience than previously possible.Comment: Conference on Information and Knowledge Management 201
Equity of Attention: Amortizing Individual Fairness in Rankings
Rankings of people and items are at the heart of selection-making,
match-making, and recommender systems, ranging from employment sites to sharing
economy platforms. As ranking positions influence the amount of attention the
ranked subjects receive, biases in rankings can lead to unfair distribution of
opportunities and resources, such as jobs or income.
This paper proposes new measures and mechanisms to quantify and mitigate
unfairness from a bias inherent to all rankings, namely, the position bias,
which leads to disproportionately less attention being paid to low-ranked
subjects. Our approach differs from recent fair ranking approaches in two
important ways. First, existing works measure unfairness at the level of
subject groups while our measures capture unfairness at the level of individual
subjects, and as such subsume group unfairness. Second, as no single ranking
can achieve individual attention fairness, we propose a novel mechanism that
achieves amortized fairness, where attention accumulated across a series of
rankings is proportional to accumulated relevance.
We formulate the challenge of achieving amortized individual fairness subject
to constraints on ranking quality as an online optimization problem and show
that it can be solved as an integer linear program. Our experimental evaluation
reveals that unfair attention distribution in rankings can be substantial, and
demonstrates that our method can improve individual fairness while retaining
high ranking quality.Comment: Accepted to SIGIR 201
Optimizing Ranking Models in an Online Setting
Online Learning to Rank (OLTR) methods optimize ranking models by directly
interacting with users, which allows them to be very efficient and responsive.
All OLTR methods introduced during the past decade have extended on the
original OLTR method: Dueling Bandit Gradient Descent (DBGD). Recently, a
fundamentally different approach was introduced with the Pairwise
Differentiable Gradient Descent (PDGD) algorithm. To date the only comparisons
of the two approaches are limited to simulations with cascading click models
and low levels of noise. The main outcome so far is that PDGD converges at
higher levels of performance and learns considerably faster than DBGD-based
methods. However, the PDGD algorithm assumes cascading user behavior,
potentially giving it an unfair advantage. Furthermore, the robustness of both
methods to high levels of noise has not been investigated. Therefore, it is
unclear whether the reported advantages of PDGD over DBGD generalize to
different experimental conditions. In this paper, we investigate whether the
previous conclusions about the PDGD and DBGD comparison generalize from ideal
to worst-case circumstances. We do so in two ways. First, we compare the
theoretical properties of PDGD and DBGD, by taking a critical look at
previously proven properties in the context of ranking. Second, we estimate an
upper and lower bound on the performance of methods by simulating both ideal
user behavior and extremely difficult behavior, i.e., almost-random
non-cascading user models. Our findings show that the theoretical bounds of
DBGD do not apply to any common ranking model and, furthermore, that the
performance of DBGD is substantially worse than PDGD in both ideal and
worst-case circumstances. These results reproduce previously published findings
about the relative performance of PDGD vs. DBGD and generalize them to
extremely noisy and non-cascading circumstances.Comment: European Conference on Information Retrieval (ECIR) 201
Internet source evaluation: The role of implicit associations and psychophysiological self-regulation
This study focused on middle school students\u2019 source evaluation skills as a key component of digital literacy. Specifically, it examined the role of two unexplored individual factors that may affect the evaluation of sources providing information about the controversial topic of the health risks associated with the use of mobile phones. The factors were the implicit association of mobile phone with health or no health, and psychophysiological self-regulation as reflected in basal Heart Rate Variability (HRV). Seventy-two seventh graders read six webpages that provided contrasting information on the unsettled topic of the potential health risks related to the use of mobile phones. Then they were asked to rank-order the six websites along the dimension of reliability (source evaluation). Findings revealed that students were able to discriminate between the most and least reliable websites, justifying their ranking in light of different criteria. However, overall, they were little accurate in rank-ordering all six Internet sources. Both implicit associations and HRV correlated with source evaluation. The interaction between the two individual variables was a significant predictor of participants\u2019 performance in rank-ordering the websites for reliability. A slope analysis revealed that when students had an average psychophysiological self-regulation, the stronger their association of the mobile phone with health, the better their performance on source evaluation. Theoretical and educational significances of the study are discussed
- …