2 research outputs found
s-AWARE: Measure-based Supervised Merging Algorithms for Crowd Assessors in Information Retrieval
In this thesis we develop a new approach to exploit crowd assessors relevance judgements for IR evaluation. We compute evaluation measures based on each assessor's ground truth. These measures are then merged weighting each assessor on the basis of his expertise level, estimated as the closeness between the assessor measures and gold standard measures, on a training set. The results highlight the greater performance of s-AWARE approach with respect to the majority of tested approaches
Cheap IR Evaluation: Fewer Topics, No Relevance Judgements, and Crowdsourced Assessments
To evaluate Information Retrieval (IR) effectiveness, a possible approach is
to use test collections, which are composed of a collection of documents, a set
of description of information needs (called topics), and a set of relevant
documents to each topic. Test collections are modelled in a competition
scenario: for example, in the well known TREC initiative, participants run
their own retrieval systems over a set of topics and they provide a ranked list
of retrieved documents; some of the retrieved documents (usually the first
ranked) constitute the so called pool, and their relevance is evaluated by
human assessors; the document list is then used to compute effectiveness
metrics and rank the participant systems. Private Web Search companies also run
their in-house evaluation exercises; although the details are mostly unknown,
and the aims are somehow different, the overall approach shares several issues
with the test collection approach.
The aim of this work is to: (i) develop and improve some state-of-the-art
work on the evaluation of IR effectiveness while saving resources, and (ii)
propose a novel, more principled and engineered, overall approach to test
collection based effectiveness evaluation.
[...