With a handful of demonstration examples, large-scale language models show
strong capability to perform various tasks by in-context learning from these
examples, without any fine-tuning. We demonstrate that in-context learning
performance can be highly unstable across samples of examples, indicating the
idiosyncrasies of how language models acquire information. We formulate example
selection for in-context learning as a sequential decision problem, and propose
a reinforcement learning algorithm for identifying generalizable policies to
select demonstration examples. For GPT-2, our learned policies demonstrate
strong abilities of generalizing to unseen tasks in training, with a 5.8%
improvement on average. Examples selected from our learned policies can even
achieve a small improvement on GPT-3 Ada. However, the improvement diminishes
on larger GPT-3 models, suggesting emerging capabilities of large language
models.Comment: EMNLP 2022, code is available at
https://github.com/ChicagoHAI/active-example-selectio