1,273 research outputs found
Generalized Team Draft Interleaving
Interleaving is an online evaluation method that compares
two ranking functions by mixing their results and interpret-
ing the users' click feedback. An important property of
an interleaving method is its sensitivity, i.e. the ability to
obtain reliable comparison outcomes with few user interac-
tions. Several methods have been proposed so far to im-
prove interleaving sensitivity, which can be roughly divided
into two areas: (a) methods that optimize the credit assign-
ment function (how the click feedback is interpreted), and
(b) methods that achieve higher sensitivity by controlling
the interleaving policy (how often a particular interleaved
result page is shown).
In this paper, we propose an interleaving framework that
generalizes the previously studied interleaving methods in
two aspects. First, it achieves a higher sensitivity by per-
forming a joint data-driven optimization of the credit as-
signment function and the interleaving policy. Second, we
formulate the framework to be general w.r.t. the search do-
main where the interleaving experiment is deployed, so that
it can be applied in domains with grid-based presentation,
such as image search. In order to simplify the optimization,
we additionally introduce a stratifed estimate of the exper-
iment outcome. This stratifcation is also useful on its own,
as it reduces the variance of the outcome and thus increases
the interleaving sensitivity.
We perform an extensive experimental study using large-
scale document and image search datasets obtained from
a commercial search engine. The experiments show that
our proposed framework achieves marked improvements in
sensitivity over efective baselines on both datasets
Optimizing Ranking Models in an Online Setting
Online Learning to Rank (OLTR) methods optimize ranking models by directly
interacting with users, which allows them to be very efficient and responsive.
All OLTR methods introduced during the past decade have extended on the
original OLTR method: Dueling Bandit Gradient Descent (DBGD). Recently, a
fundamentally different approach was introduced with the Pairwise
Differentiable Gradient Descent (PDGD) algorithm. To date the only comparisons
of the two approaches are limited to simulations with cascading click models
and low levels of noise. The main outcome so far is that PDGD converges at
higher levels of performance and learns considerably faster than DBGD-based
methods. However, the PDGD algorithm assumes cascading user behavior,
potentially giving it an unfair advantage. Furthermore, the robustness of both
methods to high levels of noise has not been investigated. Therefore, it is
unclear whether the reported advantages of PDGD over DBGD generalize to
different experimental conditions. In this paper, we investigate whether the
previous conclusions about the PDGD and DBGD comparison generalize from ideal
to worst-case circumstances. We do so in two ways. First, we compare the
theoretical properties of PDGD and DBGD, by taking a critical look at
previously proven properties in the context of ranking. Second, we estimate an
upper and lower bound on the performance of methods by simulating both ideal
user behavior and extremely difficult behavior, i.e., almost-random
non-cascading user models. Our findings show that the theoretical bounds of
DBGD do not apply to any common ranking model and, furthermore, that the
performance of DBGD is substantially worse than PDGD in both ideal and
worst-case circumstances. These results reproduce previously published findings
about the relative performance of PDGD vs. DBGD and generalize them to
extremely noisy and non-cascading circumstances.Comment: European Conference on Information Retrieval (ECIR) 201
Sensitive and Scalable Online Evaluation with Theoretical Guarantees
Multileaved comparison methods generalize interleaved comparison methods to
provide a scalable approach for comparing ranking systems based on regular user
interactions. Such methods enable the increasingly rapid research and
development of search engines. However, existing multileaved comparison methods
that provide reliable outcomes do so by degrading the user experience during
evaluation. Conversely, current multileaved comparison methods that maintain
the user experience cannot guarantee correctness. Our contribution is two-fold.
First, we propose a theoretical framework for systematically comparing
multileaved comparison methods using the notions of considerateness, which
concerns maintaining the user experience, and fidelity, which concerns reliable
correct outcomes. Second, we introduce a novel multileaved comparison method,
Pairwise Preference Multileaving (PPM), that performs comparisons based on
document-pair preferences, and prove that it is considerate and has fidelity.
We show empirically that, compared to previous multileaved comparison methods,
PPM is more sensitive to user preferences and scalable with the number of
rankers being compared.Comment: CIKM 2017, Proceedings of the 2017 ACM on Conference on Information
and Knowledge Managemen
Differentiable Unbiased Online Learning to Rank
Online Learning to Rank (OLTR) methods optimize rankers based on user
interactions. State-of-the-art OLTR methods are built specifically for linear
models. Their approaches do not extend well to non-linear models such as neural
networks. We introduce an entirely novel approach to OLTR that constructs a
weighted differentiable pairwise loss after each interaction: Pairwise
Differentiable Gradient Descent (PDGD). PDGD breaks away from the traditional
approach that relies on interleaving or multileaving and extensive sampling of
models to estimate gradients. Instead, its gradient is based on inferring
preferences between document pairs from user clicks and can optimize any
differentiable model. We prove that the gradient of PDGD is unbiased w.r.t.
user document pair preferences. Our experiments on the largest publicly
available Learning to Rank (LTR) datasets show considerable and significant
improvements under all levels of interaction noise. PDGD outperforms existing
OLTR methods both in terms of learning speed as well as final convergence.
Furthermore, unlike previous OLTR methods, PDGD also allows for non-linear
models to be optimized effectively. Our results show that using a neural
network leads to even better performance at convergence than a linear model. In
summary, PDGD is an efficient and unbiased OLTR approach that provides a better
user experience than previously possible.Comment: Conference on Information and Knowledge Management 201
Unbiased Learning to Rank with Unbiased Propensity Estimation
Learning to rank with biased click data is a well-known challenge. A variety
of methods has been explored to debias click data for learning to rank such as
click models, result interleaving and, more recently, the unbiased
learning-to-rank framework based on inverse propensity weighting. Despite their
differences, most existing studies separate the estimation of click bias
(namely the \textit{propensity model}) from the learning of ranking algorithms.
To estimate click propensities, they either conduct online result
randomization, which can negatively affect the user experience, or offline
parameter estimation, which has special requirements for click data and is
optimized for objectives (e.g. click likelihood) that are not directly related
to the ranking performance of the system. In this work, we address those
problems by unifying the learning of propensity models and ranking models. We
find that the problem of estimating a propensity model from click data is a
dual problem of unbiased learning to rank. Based on this observation, we
propose a Dual Learning Algorithm (DLA) that jointly learns an unbiased ranker
and an \textit{unbiased propensity model}. DLA is an automatic unbiased
learning-to-rank framework as it directly learns unbiased ranking models from
biased click data without any preprocessing. It can adapt to the change of bias
distributions and is applicable to online learning. Our empirical experiments
with synthetic and real-world data show that the models trained with DLA
significantly outperformed the unbiased learning-to-rank algorithms based on
result randomization and the models trained with relevance signals extracted by
click models
Taking the Counterfactual Online: Efficient and Unbiased Online Evaluation for Ranking
Counterfactual evaluation can estimate Click-Through-Rate (CTR) differences
between ranking systems based on historical interaction data, while mitigating
the effect of position bias and item-selection bias. We introduce the novel
Logging-Policy Optimization Algorithm (LogOpt), which optimizes the policy for
logging data so that the counterfactual estimate has minimal variance. As
minimizing variance leads to faster convergence, LogOpt increases the
data-efficiency of counterfactual estimation. LogOpt turns the counterfactual
approach - which is indifferent to the logging policy - into an online
approach, where the algorithm decides what rankings to display. We prove that,
as an online evaluation method, LogOpt is unbiased w.r.t. position and
item-selection bias, unlike existing interleaving methods. Furthermore, we
perform large-scale experiments by simulating comparisons between thousands of
rankers. Our results show that while interleaving methods make systematic
errors, LogOpt is as efficient as interleaving without being biased.Comment: ICTIR 202
Learning from User Interactions with Rankings: A Unification of the Field
Ranking systems form the basis for online search engines and recommendation
services. They process large collections of items, for instance web pages or
e-commerce products, and present the user with a small ordered selection. The
goal of a ranking system is to help a user find the items they are looking for
with the least amount of effort. Thus the rankings they produce should place
the most relevant or preferred items at the top of the ranking. Learning to
rank is a field within machine learning that covers methods which optimize
ranking systems w.r.t. this goal. Traditional supervised learning to rank
methods utilize expert-judgements to evaluate and learn, however, in many
situations such judgements are impossible or infeasible to obtain. As a
solution, methods have been introduced that perform learning to rank based on
user clicks instead. The difficulty with clicks is that they are not only
affected by user preferences, but also by what rankings were displayed.
Therefore, these methods have to prevent being biased by other factors than
user preference. This thesis concerns learning to rank methods based on user
clicks and specifically aims to unify the different families of these methods.
As a whole, the second part of this thesis proposes a framework that bridges
many gaps between areas of online, counterfactual, and supervised learning to
rank. It has taken approaches, previously considered independent, and unified
them into a single methodology for widely applicable and effective learning to
rank from user clicks.Comment: PhD Thesis of Harrie Oosterhuis defended at the University of
Amsterdam on November 27th 202
- …