A Theory of Emergent In-Context Learning as Implicit Structure Induction

Goyal, Navin; Hahn, Michael

A Theory of Emergent In-Context Learning as Implicit Structure Induction

Authors: Navin Goyal
Michael Hahn
Publication date: 14 March 2023
Publisher

Abstract

Scaling large language models (LLMs) leads to an emergent capacity to learn in-context from example demonstrations. Despite progress, theoretical understanding of this phenomenon remains limited. We argue that in-context learning relies on recombination of compositional operations found in natural language data. We derive an information-theoretic bound showing how in-context learning abilities arise from generic next-token prediction when the pretraining distribution has sufficient amounts of compositional structure, under linguistically motivated assumptions. A second bound provides a theoretical justification for the empirical success of prompting LLMs to output intermediate steps towards an answer. To validate theoretical predictions, we introduce a controlled setup for inducing in-context learning; unlike previous approaches, it accounts for the compositional nature of language. Trained transformers can perform in-context learning for a range of tasks, in a manner consistent with the theoretical results. Mirroring real-world LLMs in a miniature setup, in-context learning emerges when scaling parameters and data, and models perform better when prompted to output intermediate steps. Probing shows that in-context learning is supported by a representation of the input's compositional structure. Taken together, these results provide a step towards theoretical understanding of emergent behavior in large language models

Similar works

Full text

Available Versions

arXiv.org e-Print Archive

oai:arXiv.org:2303.07971

Last time updated on 24/03/2023