3 research outputs found
ExaRanker: Explanation-Augmented Neural Ranker
Recent work has shown that inducing a large language model (LLM) to generate
explanations prior to outputting an answer is an effective strategy to improve
performance on a wide range of reasoning tasks. In this work, we show that
neural rankers also benefit from explanations. We use LLMs such as GPT-3.5 to
augment retrieval datasets with explanations and train a sequence-to-sequence
ranking model to output a relevance label and an explanation for a given
query-document pair. Our model, dubbed ExaRanker, finetuned on a few thousand
examples with synthetic explanations performs on par with models finetuned on
3x more examples without explanations. Furthermore, the ExaRanker model incurs
no additional computational cost during ranking and allows explanations to be
requested on demand
InRanker: Distilled Rankers for Zero-shot Information Retrieval
Despite multi-billion parameter neural rankers being common components of
state-of-the-art information retrieval pipelines, they are rarely used in
production due to the enormous amount of compute required for inference. In
this work, we propose a new method for distilling large rankers into their
smaller versions focusing on out-of-domain effectiveness. We introduce
InRanker, a version of monoT5 distilled from monoT5-3B with increased
effectiveness on out-of-domain scenarios. Our key insight is to use language
models and rerankers to generate as much as possible synthetic "in-domain"
training data, i.e., data that closely resembles the data that will be seen at
retrieval time. The pipeline consists of two distillation phases that do not
require additional user queries or manual annotations: (1) training on existing
supervised soft teacher labels, and (2) training on teacher soft labels for
synthetic queries generated using a large language model. Consequently, models
like monoT5-60M and monoT5-220M improved their effectiveness by using the
teacher's knowledge, despite being 50x and 13x smaller, respectively. Models
and code are available at https://github.com/unicamp-dl/InRanker
BLUEX: A benchmark based on Brazilian Leading Universities Entrance eXams
One common trend in recent studies of language models (LMs) is the use of
standardized tests for evaluation. However, despite being the fifth most spoken
language worldwide, few such evaluations have been conducted in Portuguese.
This is mainly due to the lack of high-quality datasets available to the
community for carrying out evaluations in Portuguese. To address this gap, we
introduce the Brazilian Leading Universities Entrance eXams (BLUEX), a dataset
of entrance exams from the two leading universities in Brazil: UNICAMP and USP.
The dataset includes annotated metadata for evaluating the performance of NLP
models on a variety of subjects. Furthermore, BLUEX includes a collection of
recently administered exams that are unlikely to be included in the training
data of many popular LMs as of 2023. The dataset is also annotated to indicate
the position of images in each question, providing a valuable resource for
advancing the state-of-the-art in multimodal language understanding and
reasoning. We describe the creation and characteristics of BLUEX and establish
a benchmark through experiments with state-of-the-art LMs, demonstrating its
potential for advancing the state-of-the-art in natural language understanding
and reasoning in Portuguese. The data and relevant code can be found at
https://github.com/Portuguese-Benchmark-Datasets/BLUE