Search CORE

21 research outputs found

Dockerising Terrier for The Open-Source IR Replicability Challenge

Author: Câmara Arthur Barbosa
Macdonald Craig
Publication venue
Publication date: 01/07/2019
Field of study

Reproducibility and replicability are key concepts in science, and it is therefore important for information retrieval (IR) platforms to aid in reproducing and replicating experiments. In this paper, we describe the creation of a Docker container for Terrier within the framework of the OSIRRC 2019 challenge, which allows typical runs to be reproduced on TREC Test Collections such as Robust04, GOV2, Core2018. In doing so, it is hoped that the produced Docker image can be of aid to other (re)producing baseline experiments on these test collections. Initiatives like OSIRRC are key in advancing these key concepts in the IR area. By making not only the source code available, but also the exact same environment and standardising inputs and outputs, it is possible to easily compare approaches and thereby improve the quality of the research for Information Retrieval

Enlighten

Living Lab Evaluation for Life and Social Sciences Search Platforms -- LiLAS at CLEF 2021

Author: Castro Leyla Jael
Schaer Philipp
Schaible Johann
Publication venue
Publication date: 05/10/2023
Field of study

Meta-evaluation studies of system performances in controlled offline evaluation campaigns, like TREC and CLEF, show a need for innovation in evaluating IR-systems. The field of academic search is no exception to this. This might be related to the fact that relevance in academic search is multilayered and therefore the aspect of user-centric evaluation is becoming more and more important. The Living Labs for Academic Search (LiLAS) lab aims to strengthen the concept of user-centric living labs for the domain of academic search by allowing participants to evaluate their retrieval approaches in two real-world academic search systems from the life sciences and the social sciences. To this end, we provide participants with metadata on the systems' content as well as candidate lists with the task to rank the most relevant candidate to the top. Using the STELLA-infrastructure, we allow participants to easily integrate their approaches into the real-world systems and provide the possibility to compare different approaches at the same time.Comment: 8 pages. Advances in Information Retrieval - 43rd European Conference on IR Research, ECIR 2021, Virtual Event, March 28 - April 1, 202

arXiv.org e-Print Archive

Overview of LiLAS 2020 -- Living Labs for Academic Search

Author: Castro Leyla Jael Garcia
Schaer Philipp
Schaible Johann
Publication venue
Publication date: 31/10/2023
Field of study

Academic Search is a timeless challenge that the field of Information Retrieval has been dealing with for many years. Even today, the search for academic material is a broad field of research that recently started working on problems like the COVID-19 pandemic. However, test collections and specialized data sets like CORD-19 only allow for system-oriented experiments, while the evaluation of algorithms in real-world environments is only available to researchers from industry. In LiLAS, we open up two academic search platforms to allow participating research to evaluate their systems in a Docker-based research environment. This overview paper describes the motivation, infrastructure, and two systems LIVIVO and GESIS Search that are part of this CLEF lab.Comment: Manuscript version of the CLEF 2020 proceedings pape

arXiv.org e-Print Archive

Cross-Domain Sentence Modeling for Relevance Transfer with BERT

Author: Akkalyoncu Yilmaz Zeynep
Publication venue: 'University of Waterloo'
Publication date: 06/12/2019
Field of study

Standard bag-of-words term-matching techniques in document retrieval fail to exploit rich semantic information embedded in the document texts. One promising recent trend in facilitating context-aware semantic matching has been the development of massively pretrained deep transformer models, culminating in BERT as their most popular example today. In this work, we propose adapting BERT as a neural re-ranker for document retrieval to achieve large improvements on news articles. Two fundamental issues arise in applying BERT to ``ad hoc'' document retrieval on newswire collections: relevance judgments in existing test collections are provided only at the document level, and documents often exceed the length that BERT was designed to handle. To overcome these challenges, we compute and aggregate sentence-level evidence to rank documents. The lack of appropriate relevance judgments in test collections is addressed by leveraging sentence-level and passage-level relevance judgments fortuitously available in collections from other domains to capture cross-domain notions of relevance. Our experiments demonstrate that models of relevance can be transferred across domains. By leveraging semantic cues learned across various domains, we propose a model that achieves state-of-the-art results on three standard TREC newswire collections. We explore the effects of cross-domain relevance transfer, and trade-offs between using document and sentence scores for document ranking. We also present an end-to-end document retrieval system that integrates the open-source Anserini information retrieval toolkit, discussing the related technical challenges and design decisions

University of Waterloo's Institutional Repository

Evaluating Research Dataset Recommendations in a Living Lab

Author: Keller Jüri
Munz Leon Paul Mondrian
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 30/09/2022
Field of study

The search for research datasets is as important as laborious. Due to the importance of the choice of research data in further research, this decision must be made carefully. Additionally, because of the growing amounts of data in almost all areas, research data is already a central artifact in empirical sciences. Consequentially, research dataset recommendations can beneficially supplement scientific publication searches. We formulated the recommendation task as a retrieval problem by focussing on broad similarities between research datasets and scientific publications. In a multistage approach, initial recommendations were retrieved by the BM25 ranking function and dynamic queries. Subsequently, the initial ranking was re-ranked utilizing click feedback and document embeddings. The proposed system was evaluated live on real user interaction data using the STELLA infrastructure in the LiLAS Lab at CLEF 2021. Our experimental system could efficiently be fine-tuned before the live evaluation by pre-testing the system with a pseudo test collection based on prior user interaction data from the live system. The results indicate that the experimental system outperforms the other participating systems.Comment: Best of 2021 Labs: LiLA

arXiv.org e-Print Archive

A Living Lab Architecture for Reproducible Shared Task Experimentation

Author: Breuer Timo
Schaer Philipp
Publication venue: Werner Hülsbusch
Publication date: 01/01/2021
Field of study

No existing evaluation infrastructure for shared tasks currently supports both reproducible on- and offline experiments. In this work, we present an architecture that ties together both types of experiments with a focus on reproducibility. The readers are provided with a technical description of the infrastructure and details of how to contribute their own experiments to upcoming evaluation tasks

University of Regensburg Publication Server

Focal elements of neural information retrieval models. An outlook through a reproducibility study

Author: Marchesin S.
Purpura A.
Silvello G.
Publication venue: 'Elsevier BV'
Publication date: 01/01/2019
Field of study

Archivio istituzionale della ricerca - Università di Padova