1 research outputs found
ir_metadata: An Extensible Metadata Schema for IR Experiments
The information retrieval (IR) community has a strong tradition of making the
computational artifacts and resources available for future reuse, allowing the
validation of experimental results. Besides the actual test collections, the
underlying run files are often hosted in data archives as part of conferences
like TREC, CLEF, or NTCIR. Unfortunately, the run data itself does not provide
much information about the underlying experiment. For instance, the single run
file is not of much use without the context of the shared task's website or the
run data archive. In other domains, like the social sciences, it is good
practice to annotate research data with metadata. In this work, we introduce
ir_metadata - an extensible metadata schema for TREC run files based on the
PRIMAD model. We propose to align the metadata annotations to PRIMAD, which
considers components of computational experiments that can affect
reproducibility. Furthermore, we outline important components and information
that should be reported in the metadata and give evidence from the literature.
To demonstrate the usefulness of these metadata annotations, we implement new
features in repro_eval that support the outlined metadata schema for the use
case of reproducibility studies. Additionally, we curate a dataset with run
files derived from experiments with different instantiations of PRIMAD
components and annotate these with the corresponding metadata. In the
experiments, we cover reproducibility experiments that are identified by the
metadata and classified by PRIMAD. With this work, we enable IR researchers to
annotate TREC run files and improve the reuse value of experimental artifacts
even further.Comment: Resource pape