An Active Learning Approach to Efficiently Ranking Retrieval Engines

Torrey, Lisa A

An Active Learning Approach to Efficiently Ranking Retrieval Engines

Authors: Lisa A Torrey
Publication date: 1 May 2003
Publisher: Dartmouth Digital Commons

Abstract

Evaluating retrieval systems, such as those submitted to the annual TREC competition, usually requires a large number of documents to be read and judged for relevance to query topics. Test collections are far too big to be exhaustively judged, so only a subset of documents is selected to form the judgment ``pool.\u27\u27 The selection method that TREC uses produces pools that are still quite large. Research has indicated that it is possible to rank the retrieval systems correctly using substantially smaller pools. This paper introduces an active learning algorithm whose goal is to reach the correct rankings using the smallest possible number of relevance judgments. It adds one document to the pool at a time, always trying to select the document with the highest information gain. Several variants of this algorithm are described, each with improvements on the one before. Results from experiments are included for comparison with the traditional TREC pooling method. The best version of the algorithm reliably outperforms the traditional method, although its degree of improvement varies

Similar works

Full text

Open in the Core reader

Download PDF

Available Versions

Dartmouth Digital Commons (Dartmouth College)

oai:digitalcommons.dartmouth.e...

Last time updated on 31/10/2020