341 research outputs found

    Counterfactual Estimation and Optimization of Click Metrics for Search Engines

    Full text link
    Optimizing an interactive system against a predefined online metric is particularly challenging, when the metric is computed from user feedback such as clicks and payments. The key challenge is the counterfactual nature: in the case of Web search, any change to a component of the search engine may result in a different search result page for the same query, but we normally cannot infer reliably from search log how users would react to the new result page. Consequently, it appears impossible to accurately estimate online metrics that depend on user feedback, unless the new engine is run to serve users and compared with a baseline in an A/B test. This approach, while valid and successful, is unfortunately expensive and time-consuming. In this paper, we propose to address this problem using causal inference techniques, under the contextual-bandit framework. This approach effectively allows one to run (potentially infinitely) many A/B tests offline from search log, making it possible to estimate and optimize online metrics quickly and inexpensively. Focusing on an important component in a commercial search engine, we show how these ideas can be instantiated and applied, and obtain very promising results that suggest the wide applicability of these techniques

    Policy-Aware Unbiased Learning to Rank for Top-k Rankings

    Get PDF
    Counterfactual Learning to Rank (LTR) methods optimize ranking systems using logged user interactions that contain interaction biases. Existing methods are only unbiased if users are presented with all relevant items in every ranking. There is currently no existing counterfactual unbiased LTR method for top-k rankings. We introduce a novel policy-aware counterfactual estimator for LTR metrics that can account for the effect of a stochastic logging policy. We prove that the policy-aware estimator is unbiased if every relevant item has a non-zero probability to appear in the top-k ranking. Our experimental results show that the performance of our estimator is not affected by the size of k: for any k, the policy-aware estimator reaches the same retrieval performance while learning from top-k feedback as when learning from feedback on the full ranking. Lastly, we introduce novel extensions of traditional LTR methods to perform counterfactual LTR and to optimize top-k metrics. Together, our contributions introduce the first policy-aware unbiased LTR approach that learns from top-k feedback and optimizes top-k metrics. As a result, counterfactual LTR is now applicable to the very prevalent top-k ranking setting in search and recommendation.Comment: SIGIR 2020 full conference pape

    Estimating Position Bias without Intrusive Interventions

    Full text link
    Presentation bias is one of the key challenges when learning from implicit feedback in search engines, as it confounds the relevance signal. While it was recently shown how counterfactual learning-to-rank (LTR) approaches \cite{Joachims/etal/17a} can provably overcome presentation bias when observation propensities are known, it remains to show how to effectively estimate these propensities. In this paper, we propose the first method for producing consistent propensity estimates without manual relevance judgments, disruptive interventions, or restrictive relevance modeling assumptions. First, we show how to harvest a specific type of intervention data from historic feedback logs of multiple different ranking functions, and show that this data is sufficient for consistent propensity estimation in the position-based model. Second, we propose a new extremum estimator that makes effective use of this data. In an empirical evaluation, we find that the new estimator provides superior propensity estimates in two real-world systems -- Arxiv Full-text Search and Google Drive Search. Beyond these two points, we find that the method is robust to a wide range of settings in simulation studies

    Re-examining assumptions in fair and unbiased learning to rank

    Get PDF
    In this thesis, we re-examine the assumptions of existing methods for bias correction and fairness optimization in ranking. Consequently, we propose methods that are more general than the existing ones, in the sense that they rely on less assumptions, or they are applicable in more situations. On the bias side, we first show that the click model assumption matters and propose cascade model-based inverse propensity scoring (IPS). Next, we prove that the unbiasedness of IPS relies on the assumption that the clicks do not suffer from trust bias. When trust bias exists, we extend IPS and propose the affine correction (AC) method and prove that, in contrast to IPS, it gives unbiased estimates of the relevance. Finally, we show that the unbiasedness proofs of IPS and AC are conditioned on an accurate estimation of the bias parameters, and propose a bias correction method that does not rely on relevance estimation. On the fairness side, we re-examine the implicit assumption that fair distribution of exposure leads to fair treatment by the users. We argue that fairness of exposure is necessary but not enough for a fair treatment and propose a correction method for this type of bias. Finally, we notice that the existing general post-processing framework for optimizing fairness of ranking metrics is based on the Plackett-Luce distribution, the optimization of which has room for improvement for queries with a small number of repeating sessions. To close this gap, we propose a new permutation distribution based on permutation graphs

    Learning from User Interactions with Rankings: A Unification of the Field

    Get PDF
    Ranking systems form the basis for online search engines and recommendation services. They process large collections of items, for instance web pages or e-commerce products, and present the user with a small ordered selection. The goal of a ranking system is to help a user find the items they are looking for with the least amount of effort. Thus the rankings they produce should place the most relevant or preferred items at the top of the ranking. Learning to rank is a field within machine learning that covers methods which optimize ranking systems w.r.t. this goal. Traditional supervised learning to rank methods utilize expert-judgements to evaluate and learn, however, in many situations such judgements are impossible or infeasible to obtain. As a solution, methods have been introduced that perform learning to rank based on user clicks instead. The difficulty with clicks is that they are not only affected by user preferences, but also by what rankings were displayed. Therefore, these methods have to prevent being biased by other factors than user preference. This thesis concerns learning to rank methods based on user clicks and specifically aims to unify the different families of these methods. As a whole, the second part of this thesis proposes a framework that bridges many gaps between areas of online, counterfactual, and supervised learning to rank. It has taken approaches, previously considered independent, and unified them into a single methodology for widely applicable and effective learning to rank from user clicks.Comment: PhD Thesis of Harrie Oosterhuis defended at the University of Amsterdam on November 27th 202
    • …
    corecore