Research in Information Retrieval has significantly benefited from the availability of standard test collections and the use of these collections for comparative evaluation of the effectiveness of different retrieval system configurations in controlled laboratory experiments. In an attempt to design large and reliable test collections decisions regarding the assembly of the document corpus, the selection of topics, the formation of relevance judgments and the development of evaluation measures are particularly critical and affect both the cost of the constructed test collections and the effectiveness in evaluating retrieval systems. Furthermore, recently, building retrieval systems has been viewed as a machine learning task resulting in the development of a learning-to-rank methodology widely adopted by the community. It is apparent that the design and construction methodology of learning collections, along with the selection of the evaluation measure to be optimized significantly affects the quality of the resulting retrieval system. In this work we consider the construction of reliable and efficient test and training collections to be used in the evaluation of retrieval systems and in the development of new and effective ranking functions. In the process of building such collections we investigate methods of selecting the appropriate document
To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.