35 research outputs found
MetaICL: Learning to Learn In Context
We introduce MetaICL (Meta-training for In-Context Learning), a new
meta-training framework for few-shot learning where a pretrained language model
is tuned to do in-context learning on a large set of training tasks. This
meta-training enables the model to more effectively learn a new task in context
at test time, by simply conditioning on a few training examples with no
parameter updates or task-specific templates. We experiment on a large, diverse
collection of tasks consisting of 142 NLP datasets including classification,
question answering, natural language inference, paraphrase detection and more,
across seven different meta-training/target splits. MetaICL outperforms a range
of baselines including in-context learning without meta-training and multi-task
learning followed by zero-shot transfer. We find that the gains are
particularly significant for target tasks that have domain shifts from the
meta-training tasks, and that using a diverse set of the meta-training tasks is
key to improvements. We also show that MetaICL approaches (and sometimes beats)
the performance of models fully finetuned on the target task, and outperforms
much bigger models with nearly 8x parameters. Finally, we show that MetaICL is
complementary to human-written instructions, and the best performance can be
achieved by combining both approaches.Comment: 19 pages, 2 figures. Published as a conference paper at NAACL 2022
(long). Code available at https://github.com/facebookresearch/MetaIC
Z-ICL: Zero-Shot In-Context Learning with Pseudo-Demonstrations
Although large language models can be prompted for both zero- and few-shot
learning, performance drops significantly when no demonstrations are available.
In this paper, we introduce Z-ICL, a new zero-shot method that closes the gap
by constructing pseudo-demonstrations for a given test input using a raw text
corpus. Concretely, pseudo-demonstrations are constructed by (1) finding the
nearest neighbors to the test input from the corpus and pairing them with
random task labels, and (2) applying a set of techniques to reduce the amount
of direct copying the model does from the resulting demonstrations. Evaluation
on nine classification datasets shows that Z-ICL outperforms previous zero-shot
methods by a significant margin, and is on par with in-context learning with
labeled training data in the few-shot setting. Overall, Z-ICL provides a
significantly higher estimate of the zero-shot performance levels of a model,
and supports future efforts to develop better pseudo-demonstrations that
further improve zero-shot results.Comment: 11 pages; 9 figure