Fine-tuned language models use greedy decoding to answer reading
comprehension questions with relative success. However, this approach does not
ensure that the answer is a span in the given passage, nor does it guarantee
that it is the most probable one. Does greedy decoding actually perform worse
than an algorithm that does adhere to these properties? To study the
performance and optimality of greedy decoding, we present exact-extract, a
decoding algorithm that efficiently finds the most probable answer span in the
context. We compare the performance of T5 with both decoding algorithms on
zero-shot and few-shot extractive question answering. When no training examples
are available, exact-extract significantly outperforms greedy decoding.
However, greedy decoding quickly converges towards the performance of
exact-extract with the introduction of a few training examples, becoming more
extractive and increasingly likelier to generate the most probable span as the
training set grows. We also show that self-supervised training can bias the
model towards extractive behavior, increasing performance in the zero-shot
setting without resorting to annotated examples. Overall, our results suggest
that pretrained language models are so good at adapting to extractive question
answering, that it is often enough to fine-tune on a small training set for the
greedy algorithm to emulate the optimal decoding strategy.Comment: AKBC 2022 12 pages, 3 figure