22,995 research outputs found

    Position Bias Estimation for Unbiased Learning-to-Rank in eCommerce Search

    Full text link
    The Unbiased Learning-to-Rank framework has been recently proposed as a general approach to systematically remove biases, such as position bias, from learning-to-rank models. The method takes two steps - estimating click propensities and using them to train unbiased models. Most common methods proposed in the literature for estimating propensities involve some degree of intervention in the live search engine. An alternative approach proposed recently uses an Expectation Maximization (EM) algorithm to estimate propensities by using ranking features for estimating relevances. In this work we propose a novel method to directly estimate propensities which does not use any intervention in live search or rely on modeling relevance. Rather, we take advantage of the fact that the same query-document pair may naturally change ranks over time. This typically occurs for eCommerce search because of change of popularity of items over time, existence of time dependent ranking features, or addition or removal of items to the index (an item getting sold or a new item being listed). However, our method is general and can be applied to any search engine for which the rank of the same document may naturally change over time for the same query. We derive a simple likelihood function that depends on propensities only, and by maximizing the likelihood we are able to get estimates of the propensities. We apply this method to eBay search data to estimate click propensities for web and mobile search and compare these with estimates using the EM method. We also use simulated data to show that the method gives reliable estimates of the "true" simulated propensities. Finally, we train an unbiased learning-to-rank model for eBay search using the estimated propensities and show that it outperforms both baselines - one without position bias correction and one with position bias correction using the EM method.Comment: 10 pages, 3 figure

    Differentiable Unbiased Online Learning to Rank

    Full text link
    Online Learning to Rank (OLTR) methods optimize rankers based on user interactions. State-of-the-art OLTR methods are built specifically for linear models. Their approaches do not extend well to non-linear models such as neural networks. We introduce an entirely novel approach to OLTR that constructs a weighted differentiable pairwise loss after each interaction: Pairwise Differentiable Gradient Descent (PDGD). PDGD breaks away from the traditional approach that relies on interleaving or multileaving and extensive sampling of models to estimate gradients. Instead, its gradient is based on inferring preferences between document pairs from user clicks and can optimize any differentiable model. We prove that the gradient of PDGD is unbiased w.r.t. user document pair preferences. Our experiments on the largest publicly available Learning to Rank (LTR) datasets show considerable and significant improvements under all levels of interaction noise. PDGD outperforms existing OLTR methods both in terms of learning speed as well as final convergence. Furthermore, unlike previous OLTR methods, PDGD also allows for non-linear models to be optimized effectively. Our results show that using a neural network leads to even better performance at convergence than a linear model. In summary, PDGD is an efficient and unbiased OLTR approach that provides a better user experience than previously possible.Comment: Conference on Information and Knowledge Management 201

    Unbiased Learning to Rank with Unbiased Propensity Estimation

    Full text link
    Learning to rank with biased click data is a well-known challenge. A variety of methods has been explored to debias click data for learning to rank such as click models, result interleaving and, more recently, the unbiased learning-to-rank framework based on inverse propensity weighting. Despite their differences, most existing studies separate the estimation of click bias (namely the \textit{propensity model}) from the learning of ranking algorithms. To estimate click propensities, they either conduct online result randomization, which can negatively affect the user experience, or offline parameter estimation, which has special requirements for click data and is optimized for objectives (e.g. click likelihood) that are not directly related to the ranking performance of the system. In this work, we address those problems by unifying the learning of propensity models and ranking models. We find that the problem of estimating a propensity model from click data is a dual problem of unbiased learning to rank. Based on this observation, we propose a Dual Learning Algorithm (DLA) that jointly learns an unbiased ranker and an \textit{unbiased propensity model}. DLA is an automatic unbiased learning-to-rank framework as it directly learns unbiased ranking models from biased click data without any preprocessing. It can adapt to the change of bias distributions and is applicable to online learning. Our empirical experiments with synthetic and real-world data show that the models trained with DLA significantly outperformed the unbiased learning-to-rank algorithms based on result randomization and the models trained with relevance signals extracted by click models

    Policy-Aware Unbiased Learning to Rank for Top-k Rankings

    Get PDF
    Counterfactual Learning to Rank (LTR) methods optimize ranking systems using logged user interactions that contain interaction biases. Existing methods are only unbiased if users are presented with all relevant items in every ranking. There is currently no existing counterfactual unbiased LTR method for top-k rankings. We introduce a novel policy-aware counterfactual estimator for LTR metrics that can account for the effect of a stochastic logging policy. We prove that the policy-aware estimator is unbiased if every relevant item has a non-zero probability to appear in the top-k ranking. Our experimental results show that the performance of our estimator is not affected by the size of k: for any k, the policy-aware estimator reaches the same retrieval performance while learning from top-k feedback as when learning from feedback on the full ranking. Lastly, we introduce novel extensions of traditional LTR methods to perform counterfactual LTR and to optimize top-k metrics. Together, our contributions introduce the first policy-aware unbiased LTR approach that learns from top-k feedback and optimizes top-k metrics. As a result, counterfactual LTR is now applicable to the very prevalent top-k ranking setting in search and recommendation.Comment: SIGIR 2020 full conference pape

    Unbiased split selection for classification trees based on the Gini Index

    Get PDF
    The Gini gain is one of the most common variable selection criteria in machine learning. We derive the exact distribution of the maximally selected Gini gain in the context of binary classification using continuous predictors by means of a combinatorial approach. This distribution provides a formal support for variable selection bias in favor of variables with a high amount of missing values when the Gini gain is used as split selection criterion, and we suggest to use the resulting p-value as an unbiased split selection criterion in recursive partitioning algorithms. We demonstrate the efficiency of our novel method in simulation- and real data- studies from veterinary gynecology in the context of binary classification and continuous predictor variables with different numbers of missing values. Our method is extendible to categorical and ordinal predictor variables and to other split selection criteria such as the cross-entropy criterion
    corecore