9,612 research outputs found
The Learnability of In-Context Learning
In-context learning is a surprising and important phenomenon that emerged
when modern language models were scaled to billions of learned parameters.
Without modifying a large language model's weights, it can be tuned to perform
various downstream natural language tasks simply by including concatenated
training examples of these tasks in its input. Though disruptive for many
practical applications of large language models, this emergent learning
paradigm is not well understood from a theoretical perspective. In this paper,
we propose a first-of-its-kind PAC based framework for in-context learnability,
and use it to provide the first finite sample complexity results for the
in-context learning setup. Our framework includes an initial pretraining phase,
which fits a function to the pretraining distribution, and then a second
in-context learning phase, which keeps this function constant and concatenates
training examples of the downstream task in its input. We use our framework in
order to prove that, under mild assumptions, when the pretraining distribution
is a mixture of latent tasks (a model often considered for natural language
pretraining), these tasks can be efficiently learned via in-context learning,
even though the model's weights are unchanged and the input significantly
diverges from the pretraining distribution. Our theoretical analysis reveals
that in this setting, in-context learning is more about identifying the task
than about learning it, a result which is in line with a series of recent
empirical findings. We hope that the in-context learnability framework
presented in this paper will facilitate future progress towards a deeper
understanding of this important new learning paradigm
Compositional Exemplars for In-context Learning
Large pretrained language models (LMs) have shown impressive In-Context
Learning (ICL) ability, where the model learns to do an unseen task via a
prompt consisting of input-output examples as the demonstration, without any
parameter updates. The performance of ICL is highly dominated by the quality of
the selected in-context examples. However, previous selection methods are
mostly based on simple heuristics, leading to sub-optimal performance. In this
work, we formulate in-context example selection as a subset selection problem.
We propose CEIL (Compositional Exemplars for In-context Learning), which is
instantiated by Determinantal Point Processes (DPPs) to model the interaction
between the given input and in-context examples, and optimized through a
carefully-designed contrastive learning objective to obtain preference from
LMs. We validate CEIL on 12 classification and generation datasets from 7
distinct NLP tasks, including sentiment analysis, paraphrase detection, natural
language inference, commonsense reasoning, open-domain question answering, code
generation, and semantic parsing. Extensive experiments demonstrate not only
the state-of-the-art performance but also the transferability and
compositionality of CEIL, shedding new light on effective and efficient
in-context learning. Our code is released at
https://github.com/HKUNLP/icl-ceil.Comment: Accepted in ICML 202
Link-Context Learning for Multimodal LLMs
The ability to learn from context with novel concepts, and deliver
appropriate responses are essential in human conversations. Despite current
Multimodal Large Language Models (MLLMs) and Large Language Models (LLMs) being
trained on mega-scale datasets, recognizing unseen images or understanding
novel concepts in a training-free manner remains a challenge. In-Context
Learning (ICL) explores training-free few-shot learning, where models are
encouraged to ``learn to learn" from limited tasks and generalize to unseen
tasks. In this work, we propose link-context learning (LCL), which emphasizes
"reasoning from cause and effect" to augment the learning capabilities of
MLLMs. LCL goes beyond traditional ICL by explicitly strengthening the causal
relationship between the support set and the query set. By providing
demonstrations with causal links, LCL guides the model to discern not only the
analogy but also the underlying causal associations between data points, which
empowers MLLMs to recognize unseen images and understand novel concepts more
effectively. To facilitate the evaluation of this novel approach, we introduce
the ISEKAI dataset, comprising exclusively of unseen generated image-label
pairs designed for link-context learning. Extensive experiments show that our
LCL-MLLM exhibits strong link-context learning capabilities to novel concepts
over vanilla MLLMs. Code and data will be released at
https://github.com/isekai-portal/Link-Context-Learning.Comment: 10 pages, 8 figure
Understanding In-Context Learning from Repetitions
This paper explores the elusive mechanism underpinning in-context learning in
Large Language Models (LLMs). Our work provides a novel perspective by
examining in-context learning via the lens of surface repetitions. We
quantitatively investigate the role of surface features in text generation, and
empirically establish the existence of \emph{token co-occurrence
reinforcement}, a principle that strengthens the relationship between two
tokens based on their contextual co-occurrences. By investigating the dual
impacts of these features, our research illuminates the internal workings of
in-context learning and expounds on the reasons for its failures. This paper
provides an essential contribution to the understanding of in-context learning
and its potential limitations, providing a fresh perspective on this exciting
capability
- …