4 research outputs found
Schema-learning and rebinding as mechanisms of in-context learning and emergence
In-context learning (ICL) is one of the most powerful and most unexpected
capabilities to emerge in recent transformer-based large language models
(LLMs). Yet the mechanisms that underlie it are poorly understood. In this
paper, we demonstrate that comparable ICL capabilities can be acquired by an
alternative sequence prediction learning method using clone-structured causal
graphs (CSCGs). Moreover, a key property of CSCGs is that, unlike
transformer-based LLMs, they are {\em interpretable}, which considerably
simplifies the task of explaining how ICL works. Specifically, we show that it
uses a combination of (a) learning template (schema) circuits for pattern
completion, (b) retrieving relevant templates in a context-sensitive manner,
and (c) rebinding of novel tokens to appropriate slots in the templates. We go
on to marshall evidence for the hypothesis that similar mechanisms underlie ICL
in LLMs. For example, we find that, with CSCGs as with LLMs, different
capabilities emerge at different levels of overparameterization, suggesting
that overparameterization helps in learning more complex template (schema)
circuits. By showing how ICL can be achieved with small models and datasets, we
open up a path to novel architectures, and take a vital step towards a more
general understanding of the mechanics behind this important capability
Graph schemas as abstractions for transfer learning, inference, and planning
We propose schemas as a model for abstractions that can be used for rapid
transfer learning, inference, and planning. Common structured representations
of concepts and behaviors -- schemas -- have been proposed as a powerful way to
encode abstractions. Latent graph learning is emerging as a new computational
model of the hippocampus to explain map learning and transitive inference. We
build on this work to show that learned latent graphs in these models have a
slot structure -- schemas -- that allow for quick knowledge transfer across
environments. In a new environment, an agent can rapidly learn new bindings
between the sensory stream to multiple latent schemas and select the best
fitting one to guide behavior. To evaluate these graph schemas, we use two
previously published challenging tasks: the memory & planning game and one-shot
StreetLearn, that are designed to test rapid task solving in novel
environments. Graph schemas can be learned in far fewer episodes than previous
baselines, and can model and plan in a few steps in novel variations of these
tasks. We further demonstrate learning, matching, and reusing graph schemas in
navigation tasks in more challenging environments with aliased observations and
size variations, and show how different schemas can be composed to model larger
2D and 3D environments.Comment: 12 pages, 5 figures in main paper, 12 pages and 8 figures in appendi
Inference by Reparameterization using Neural Population Codes
Behavioral experiments on humans and animals suggest that the brain performs probabilistic inference to interpret its environment. Here we present a general-purpose, biologically plausible implementation of approximate inference based on Probabilistic Population Codes (PPCs). PPCs are distributed neural representations of probability distributions that are capable of implementing marginalization and cue-integration in a biologically plausible way. By connecting multiple PPCs together, we can naturally represent multivariate probability distributions, and capture the conditional dependency structure by setting those connections as in a probabilistic graphical model. To perform inference in general graphical models, one convenient and often accurate algorithm is Loopy Belief Propagation (LBP), a ‘message-passing’ algorithm that uses local marginalization and integration operations to perform approximate inference efficiently even for complex models. In LBP, a message from one node to a neighboring node is a function of incoming messages from all neighboring nodes, except the recipient. This exception renders it neurally implausible because neurons cannot readily send many different signals to many different target neurons. Interestingly, however, LBP can be reformulated as a sequence of Tree-based Re-Parameterization (TRP) updates on the graphical model which re-factorizes a portion of the probability distribution. Although this formulation still implicitly has the message exclusion problem, we show this can be circumvented by converting the algorithm to a nonlinear dynamical system with auxiliary variables and a separation of time-scales. By combining these ideas, we show that a network of PPCs can represent multivariate probability distributions and implement the TRP updates for the graphical model to perform probabilistic inference. Simulations with Gaussian graphical models demonstrate that the performance of the PPC-based neural network implementation of TRP updates for probabilistic inference is comparable to the direct evaluation of LBP, and thus provides a compelling substrate for general, probabilistic inference in the brain