Cheap IR Evaluation:
Fewer Topics, No Relevance Judgements, and Crowdsourced Assessments

Roitero, Kevin

Cheap IR Evaluation: Fewer Topics, No Relevance Judgements, and Crowdsourced Assessments

Authors: Kevin Roitero
Publication date: 19 March 2020
Publisher: Universit\ue0 degli Studi di Udine

Abstract

To evaluate Information Retrieval (IR) effectiveness, a possible approach is to use test collections, which are composed of a collection of documents, a set of description of information needs (called topics), and a set of relevant documents to each topic. Test collections are modelled in a competition scenario: for example, in the well known TREC initiative, participants run their own retrieval systems over a set of topics and they provide a ranked list of retrieved documents; some of the retrieved documents (usually the first ranked) constitute the so called pool, and their relevance is evaluated by human assessors; the document list is then used to compute effectiveness metrics and rank the participant systems. Private Web Search companies also run their in-house evaluation exercises; although the details are mostly unknown, and the aims are somehow different, the overall approach shares several issues with the test collection approach. The aim of this work is to: (i) develop and improve some state-of-the-art work on the evaluation of IR effectiveness while saving resources, and (ii) propose a novel, more principled and engineered, overall approach to test collection based effectiveness evaluation. [...

Similar works

Full text

Open in the Core reader

Download PDF

Available Versions

Institutional Research Information System

oai:air.uniud.it:11390/1185502

Last time updated on 29/11/2020

Archivio istituzionale della ricerca - Università degli Studi di Udine

oai:air.uniud.it:11390/1185502

Last time updated on 20/11/2020