21 research outputs found
Dockerising Terrier for The Open-Source IR Replicability Challenge
Reproducibility and replicability are key concepts in science, and
it is therefore important for information retrieval (IR) platforms
to aid in reproducing and replicating experiments. In this paper,
we describe the creation of a Docker container for Terrier within
the framework of the OSIRRC 2019 challenge, which allows typical
runs to be reproduced on TREC Test Collections such as Robust04,
GOV2, Core2018. In doing so, it is hoped that the produced Docker
image can be of aid to other (re)producing baseline experiments on
these test collections. Initiatives like OSIRRC are key in advancing
these key concepts in the IR area. By making not only the source
code available, but also the exact same environment and standardising inputs and outputs, it is possible to easily compare approaches
and thereby improve the quality of the research for Information
Retrieval
Living Lab Evaluation for Life and Social Sciences Search Platforms -- LiLAS at CLEF 2021
Meta-evaluation studies of system performances in controlled offline
evaluation campaigns, like TREC and CLEF, show a need for innovation in
evaluating IR-systems. The field of academic search is no exception to this.
This might be related to the fact that relevance in academic search is
multilayered and therefore the aspect of user-centric evaluation is becoming
more and more important. The Living Labs for Academic Search (LiLAS) lab aims
to strengthen the concept of user-centric living labs for the domain of
academic search by allowing participants to evaluate their retrieval approaches
in two real-world academic search systems from the life sciences and the social
sciences. To this end, we provide participants with metadata on the systems'
content as well as candidate lists with the task to rank the most relevant
candidate to the top. Using the STELLA-infrastructure, we allow participants to
easily integrate their approaches into the real-world systems and provide the
possibility to compare different approaches at the same time.Comment: 8 pages. Advances in Information Retrieval - 43rd European Conference
on IR Research, ECIR 2021, Virtual Event, March 28 - April 1, 202
Overview of LiLAS 2020 -- Living Labs for Academic Search
Academic Search is a timeless challenge that the field of Information
Retrieval has been dealing with for many years. Even today, the search for
academic material is a broad field of research that recently started working on
problems like the COVID-19 pandemic. However, test collections and specialized
data sets like CORD-19 only allow for system-oriented experiments, while the
evaluation of algorithms in real-world environments is only available to
researchers from industry. In LiLAS, we open up two academic search platforms
to allow participating research to evaluate their systems in a Docker-based
research environment. This overview paper describes the motivation,
infrastructure, and two systems LIVIVO and GESIS Search that are part of this
CLEF lab.Comment: Manuscript version of the CLEF 2020 proceedings pape
Cross-Domain Sentence Modeling for Relevance Transfer with BERT
Standard bag-of-words term-matching techniques in document retrieval fail to exploit rich semantic information embedded in the document texts. One promising recent trend in facilitating context-aware semantic matching has been the development of massively pretrained deep transformer models, culminating in BERT as their most popular example today. In this work, we propose adapting BERT as a neural re-ranker for document retrieval to achieve large improvements on news articles. Two fundamental issues arise in applying BERT to ``ad hoc'' document retrieval on newswire collections: relevance judgments in existing test collections are provided only at the document level, and documents often exceed the length that BERT was designed to handle. To overcome these challenges, we compute and aggregate sentence-level evidence to rank documents. The lack of appropriate relevance judgments in test collections is addressed by leveraging sentence-level and passage-level relevance judgments fortuitously available in collections from other domains to capture cross-domain notions of relevance. Our experiments demonstrate that models of relevance can be transferred across domains. By leveraging semantic cues learned across various domains, we propose a model that achieves state-of-the-art results on three standard TREC newswire collections. We explore the effects of cross-domain relevance transfer, and trade-offs between using document and sentence scores for document ranking. We also present an end-to-end document retrieval system that integrates the open-source Anserini information retrieval toolkit, discussing the related technical challenges and design decisions
Evaluating Research Dataset Recommendations in a Living Lab
The search for research datasets is as important as laborious. Due to the
importance of the choice of research data in further research, this decision
must be made carefully. Additionally, because of the growing amounts of data in
almost all areas, research data is already a central artifact in empirical
sciences. Consequentially, research dataset recommendations can beneficially
supplement scientific publication searches. We formulated the recommendation
task as a retrieval problem by focussing on broad similarities between research
datasets and scientific publications. In a multistage approach, initial
recommendations were retrieved by the BM25 ranking function and dynamic
queries. Subsequently, the initial ranking was re-ranked utilizing click
feedback and document embeddings. The proposed system was evaluated live on
real user interaction data using the STELLA infrastructure in the LiLAS Lab at
CLEF 2021. Our experimental system could efficiently be fine-tuned before the
live evaluation by pre-testing the system with a pseudo test collection based
on prior user interaction data from the live system. The results indicate that
the experimental system outperforms the other participating systems.Comment: Best of 2021 Labs: LiLA
A Living Lab Architecture for Reproducible Shared Task Experimentation
No existing evaluation infrastructure for shared tasks currently supports both reproducible on- and offline experiments. In this work, we present an architecture that ties together both types of experiments with a focus on reproducibility. The readers are provided with a technical description of the infrastructure and details of how to contribute their own experiments to upcoming evaluation tasks