Location of Repository

Score Distribution Models: Assumptions, Intuition, and Robustness to Score Manipulation

By Evangelos Kanoulas, Keshi Dai, Virgil Pavlu and Javed A. Aslam

Abstract

Inferring the score distribution of relevant and non-relevant documents is an essential task for many IR applications (e.g. information filtering, recall-oriented IR, meta-search, distributed IR). Modeling score distributions in an accurate manner is the basis of any inference. Thus, numerous score distribution models have been proposed in the literature. Most of the models were proposed on the basis of empirical evidence and goodness-of-fit. In this work, we model score distributions in a rather different, systematic manner. We start with a basic assumption on the distribution of terms in a document. Following the transformations applied on term frequencies by two basic ranking functions, BM25 and Language Models, we derive the distribution of the produced scores for all documents. Then we focus on the relevant documents. We detach our analysis from particular ranking functions. Instead, we consider a model for precision-recall curves, and given this model, we present a general mathematical framework which, given any score distribution for all retrieved documents, produces an analytical formula for the score distribution of relevant documents that is consistent with the precision-recall curves that follow the aforementioned model. In particular, assuming a Gamma distribution for all retrieved documents, we show that the derived distribution for the relevant documents resembles a Gaussian distribution with a heavy right tail

Topics: Categories and Subject Descriptors, H.3.3 [Information Search and Retrieval] Retrieval models General Terms, Theory, Measurement Keywords, information retrieval, score distribution, density functions, recall-precision curve
Year: 2011
OAI identifier: oai:CiteSeerX.psu:10.1.1.190.9000
Provided by: CiteSeerX
Download PDF:
Sorry, we are unable to provide the full text but you may find it at the following location(s):
  • http://citeseerx.ist.psu.edu/v... (external link)
  • http://kanoulas.staff.shef.ac.... (external link)
  • Suggested articles


    To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.