Search CORE

2 research outputs found

Bias-variance analysis in estimating true query model for information retrieval

Author: Amati
Banks
Bishop
Collins-Thompson
Dawei Song
Duda
Geman
Hofmann
Jun Wang
Karimzadehgan
Lafferty
Li
Maron
Peng Zhang
Perlich
Robertson
Spärck Jones
Valentini
Yuexian Hou
Zhang
Publication venue: 'Elsevier BV'
Publication date: 01/01/2014
Field of study

The estimation of query model is an important task in language modeling (LM) approaches to information retrieval (IR). The ideal estimation is expected to be not only effective in terms of high mean retrieval performance over all queries, but also stable in terms of low variance of retrieval performance across different queries. In practice, however, improving effectiveness can sacrifice stability, and vice versa. In this paper, we propose to study this tradeoff from a new perspective, i.e., the bias-variance tradeoff, which is a fundamental theory in statistics. We formulate the notion of bias-variance regarding retrieval performance and estimation quality of query models. We then investigate several estimated query models, by analyzing when and why the bias-variance tradeoff will occur, and how the bias and variance can be reduced simultaneously. A series of experiments on four TREC collections have been conducted to systematically evaluate our bias-variance analysis. Our approach and results will potentially form an analysis framework and a novel evaluation strategy for query language modeling

CiteSeerX

Crossref

Open Research Online (The Open University)

On modeling rank-independent risk in estimating probability of Relevance

Author: Hou Yuexian
Song Dawei
Wang Jun
Zhang Peng
Zhao Xiaozhao
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2011
Field of study

Estimating the probability of relevance for a document is fundamental in information retrieval. From a theoretical point of view, risk exists in the estimation process, in the sense that the estimated probabilities may not be the actual ones precisely. The estimation risk is often considered to be dependent on the rank. For example, the probability ranking principle assumes that ranking documents in the order of decreasing probability of relevance can optimize the rank effectiveness. This implies that a precise estimation can yield an optimal rank. However, an optimal (or even ideal) rank does not always guarantee that the estimated probabilities are precise. This means that part of the estimation risk is rank-independent. It imposes practical risks in the applications, such as pseudo relevance feedback, where different estimated probabilities of relevance in the first-round retrieval will make a difference even when two ranks are identical. In this paper, we will explore the effect and the modeling of such rank-independent risk. A risk management method is proposed to adaptively adjust the rank-independent risk. Experimental results on several TREC collections demonstrate the effectiveness of the proposed models for both pseudo-relevance feedback and relevance feedback

Crossref

Open Research Online (The Open University)

UCL Discovery