9,825 research outputs found
Bias-variance analysis in estimating true query model for information retrieval
The estimation of query model is an important task in language modeling (LM) approaches to information retrieval (IR). The ideal estimation is expected to be not only effective in terms of high mean retrieval performance over all queries, but also stable in terms of low variance of retrieval performance across different queries. In practice, however, improving effectiveness can sacrifice stability, and vice versa. In this paper, we propose to study this tradeoff from a new perspective, i.e., the bias-variance tradeoff, which is a fundamental theory in statistics. We formulate the notion of bias-variance regarding retrieval performance and estimation quality of query models. We then investigate several estimated query models, by analyzing when and why the bias-variance tradeoff will occur, and how the bias and variance can be reduced simultaneously. A series of experiments on four TREC collections have been conducted to systematically evaluate our bias-variance analysis. Our approach and results will potentially form an analysis framework and a novel evaluation strategy for query language modeling
Approximating true relevance model in relevance feedback.
Relevance is an essential concept in information retrieval (IR) and relevance estimation is a fundamental IR task. It involves not only document relevance estimation, but also estimation of user's information need. Relevance-based language model aims to estimate a relevance model (i.e., a relevant query term distribution) from relevance feedback documents. The true relevance model should be generated from truly relevant documents. The ideal estimation of the true relevance model is expected to be not only effective in terms of mean retrieval performance (e.g., Mean Average Precision) over all the queries, but also stable in the sense that the performance is stable across different individual queries. In practice, however, in approximating/estimating the true relevance model, the improvement of retrieval effectiveness often sacrifices the retrieval stability, and vice versa. In this thesis, we propose to explore and analyze such effectiveness-stability tradeoff from a new perspective, i.e., the bias-variance tradeoff that is a fundamental theory in statistical estimation. We first formulate the bias, variance and the trade-off between them for retrieval performance as well as for query model estimation. We then analytically and empirically study a number of factors (e.g., query model complexity, query model combination, document weight smoothness and irrelevant documents removal) that can affect the bias and variance. Our study shows that the proposed bias-variance trade-off analysis can serve as an analytical framework for query model estimation. We then investigate in depth on two particular key factors: document weight smoothness and removal of irrelevant documents, in query model estimation, by proposing novel methods for document weight smoothing and irrelevance distribution separation, respectively. Systematic experimental evaluation on TREC collections shows that the proposed methods can improve both retrieval effectiveness and retrieval stability of query model estimation. In addition to the above main contributions, we also carry out initial exploration on two further directions: the formulation of bias-variance in personalization and looking at the query model estimation via a novel theoretical angle (i.e., Quantum theory) that has partially inspired our research
Policy-Aware Unbiased Learning to Rank for Top-k Rankings
Counterfactual Learning to Rank (LTR) methods optimize ranking systems using
logged user interactions that contain interaction biases. Existing methods are
only unbiased if users are presented with all relevant items in every ranking.
There is currently no existing counterfactual unbiased LTR method for top-k
rankings. We introduce a novel policy-aware counterfactual estimator for LTR
metrics that can account for the effect of a stochastic logging policy. We
prove that the policy-aware estimator is unbiased if every relevant item has a
non-zero probability to appear in the top-k ranking. Our experimental results
show that the performance of our estimator is not affected by the size of k:
for any k, the policy-aware estimator reaches the same retrieval performance
while learning from top-k feedback as when learning from feedback on the full
ranking. Lastly, we introduce novel extensions of traditional LTR methods to
perform counterfactual LTR and to optimize top-k metrics. Together, our
contributions introduce the first policy-aware unbiased LTR approach that
learns from top-k feedback and optimizes top-k metrics. As a result,
counterfactual LTR is now applicable to the very prevalent top-k ranking
setting in search and recommendation.Comment: SIGIR 2020 full conference pape
Surprisingly Rational: Probability theory plus noise explains biases in judgment
The systematic biases seen in people's probability judgments are typically
taken as evidence that people do not reason about probability using the rules
of probability theory, but instead use heuristics which sometimes yield
reasonable judgments and sometimes systematic biases. This view has had a major
impact in economics, law, medicine, and other fields; indeed, the idea that
people cannot reason with probabilities has become a widespread truism. We
present a simple alternative to this view, where people reason about
probability according to probability theory but are subject to random variation
or noise in the reasoning process. In this account the effect of noise is
cancelled for some probabilistic expressions: analysing data from two
experiments we find that, for these expressions, people's probability judgments
are strikingly close to those required by probability theory. For other
expressions this account produces systematic deviations in probability
estimates. These deviations explain four reliable biases in human probabilistic
reasoning (conservatism, subadditivity, conjunction and disjunction fallacies).
These results suggest that people's probability judgments embody the rules of
probability theory, and that biases in those judgments are due to the effects
of random noise.Comment: 64 pages. Final preprint version. In press, Psychological Revie
IRGAN: A Minimax Game for Unifying Generative and Discriminative Information Retrieval Models
This paper provides a unified account of two schools of thinking in
information retrieval modelling: the generative retrieval focusing on
predicting relevant documents given a query, and the discriminative retrieval
focusing on predicting relevancy given a query-document pair. We propose a game
theoretical minimax game to iteratively optimise both models. On one hand, the
discriminative model, aiming to mine signals from labelled and unlabelled data,
provides guidance to train the generative model towards fitting the underlying
relevance distribution over documents given the query. On the other hand, the
generative model, acting as an attacker to the current discriminative model,
generates difficult examples for the discriminative model in an adversarial way
by minimising its discrimination objective. With the competition between these
two models, we show that the unified framework takes advantage of both schools
of thinking: (i) the generative model learns to fit the relevance distribution
over documents via the signals from the discriminative model, and (ii) the
discriminative model is able to exploit the unlabelled data selected by the
generative model to achieve a better estimation for document ranking. Our
experimental results have demonstrated significant performance gains as much as
23.96% on Precision@5 and 15.50% on MAP over strong baselines in a variety of
applications including web search, item recommendation, and question answering.Comment: 12 pages; appendix adde
Taking the Counterfactual Online: Efficient and Unbiased Online Evaluation for Ranking
Counterfactual evaluation can estimate Click-Through-Rate (CTR) differences
between ranking systems based on historical interaction data, while mitigating
the effect of position bias and item-selection bias. We introduce the novel
Logging-Policy Optimization Algorithm (LogOpt), which optimizes the policy for
logging data so that the counterfactual estimate has minimal variance. As
minimizing variance leads to faster convergence, LogOpt increases the
data-efficiency of counterfactual estimation. LogOpt turns the counterfactual
approach - which is indifferent to the logging policy - into an online
approach, where the algorithm decides what rankings to display. We prove that,
as an online evaluation method, LogOpt is unbiased w.r.t. position and
item-selection bias, unlike existing interleaving methods. Furthermore, we
perform large-scale experiments by simulating comparisons between thousands of
rankers. Our results show that while interleaving methods make systematic
errors, LogOpt is as efficient as interleaving without being biased.Comment: ICTIR 202
- …