1 research outputs found
On the Estimation and Use of Statistical Modelling in Information Retrieval
Several tasks in information retrieval (IR) rely on assumptions regarding the
distribution of some property (such as term frequency) in the data being
processed. This thesis argues that such distributional assumptions can lead to
incorrect conclusions and proposes a statistically principled method for
determining the "true" distribution. This thesis further applies this method to
derive a new family of ranking models that adapt their computations to the
statistics of the data being processed. Experimental evaluation shows results
on par or better than multiple strong baselines on several TREC collections.
Overall, this thesis concludes that distributional assumptions can be replaced
with an effective, efficient and principled method for determining the "true"
distribution and that using the "true" distribution can lead to improved
retrieval performance.Comment: Phd thesi