Search CORE

13 research outputs found

Predictive auxiliary objectives in deep RL mimic learning in the brain

Author: Fang Ching
Stachenfeld Kimberly L
Publication venue
Publication date: 08/12/2023
Field of study

The ability to predict upcoming events has been hypothesized to comprise a key aspect of natural and machine cognition. This is supported by trends in deep reinforcement learning (RL), where self-supervised auxiliary objectives such as prediction are widely used to support representation learning and improve task performance. Here, we study the effects predictive auxiliary objectives have on representation learning across different modules of an RL system and how these mimic representational changes observed in the brain. We find that predictive objectives improve and stabilize learning particularly in resource-limited architectures, and we identify settings where longer predictive horizons better support representational transfer. Furthermore, we find that representational changes in this RL system bear a striking resemblance to changes in neural activity observed in the brain across various experiments. Specifically, we draw a connection between the auxiliary predictive model of the RL system and hippocampus, an area thought to learn a predictive model to support memory-guided behavior. We also connect the encoder network and the value learning network of the RL system to visual cortex and striatum in the brain, respectively. This work demonstrates how representation learning in deep RL systems can provide an interpretable framework for modeling multi-region interactions in the brain. The deep RL perspective taken here also suggests an additional role of the hippocampus in the brain -- that of an auxiliary learning system that benefits representation learning in other regions

arXiv.org e-Print Archive

Probabilistic Successor Representations with Kalman Temporal Differences

Author: Burgess Neil
Geerts Jesse P.
Stachenfeld Kimberly L.
Publication venue: 'Cognitive Computational Neuroscience'
Publication date: 01/01/2019
Field of study

The effectiveness of Reinforcement Learning (RL) depends on an animal's ability to assign credit for rewards to the appropriate preceding stimuli. One aspect of understanding the neural underpinnings of this process involves understanding what sorts of stimulus representations support generalisation. The Successor Representation (SR), which enforces generalisation over states that predict similar outcomes, has become an increasingly popular model in this space of inquiries. Another dimension of credit assignment involves understanding how animals handle uncertainty about learned associations, using probabilistic methods such as Kalman Temporal Differences (KTD). Combining these approaches, we propose using KTD to estimate a distribution over the SR. KTD-SR captures uncertainty about the estimated SR as well as covariances between different long-term predictions. We show that because of this, KTD-SR exhibits partial transition revaluation as humans do in this experiment without additional replay, unlike the standard TD-SR algorithm. We conclude by discussing future applications of the KTD-SR as a model of the interaction between predictive and probabilistic animal reasoning.Comment: Conference on Cognitive Computational Neuroscienc

arXiv.org e-Print Archive

Crossref

UCL Discovery

Rapid learning of predictive maps with STDP and theta phase precession

Author: Barry Caswell
de Cothi William
George Tom M
Stachenfeld Kimberly L
Publication venue: 'eLife Sciences Publications, Ltd'
Publication date: 16/03/2023
Field of study

The predictive map hypothesis is a promising candidate principle for hippocampal function. A favoured formalisation of this hypothesis, called the successor representation, proposes that each place cell encodes the expected state occupancy of its target location in the near future. This predictive framework is supported by behavioural as well as electrophysiological evidence and has desirable consequences for both the generalisability and efficiency of reinforcement learning algorithms. However, it is unclear how the successor representation might be learnt in the brain. Error-driven temporal difference learning, commonly used to learn successor representations in artificial agents, is not known to be implemented in hippocampal networks. Instead, we demonstrate that spike-timing dependent plasticity (STDP), a form of Hebbian learning, acting on temporally compressed trajectories known as 'theta sweeps', is sufficient to rapidly learn a close approximation to the successor representation. The model is biologically plausible - it uses spiking neurons modulated by theta-band oscillations, diffuse and overlapping place cell-like state representations, and experimentally matched parameters. We show how this model maps onto known aspects of hippocampal circuitry and explains substantial variance in the temporal difference successor matrix, consequently giving rise to place cells that demonstrate experimentally observed successor representation-related phenomena including backwards expansion on a 1D track and elongation near walls in 2D. Finally, our model provides insight into the observed topographical ordering of place field sizes along the dorsal-ventral axis by showing this is necessary to prevent the detrimental mixing of larger place fields, which encode longer timescale successor representations, with more fine-grained predictions of spatial location

UCL Discovery

A probabilistic successor representation for context-dependent learning

Author: Burgess Neil
Geerts Jesse P
Gershman Samuel J
Stachenfeld Kimberly L
Publication venue: 'American Psychological Association (APA)'
Publication date: 11/05/2023
Field of study

Two of the main impediments to learning complex tasks are that relationships between different stimuli, including rewards, can be uncertain and context-dependent. Reinforcement learning (RL) provides a framework for learning, by predicting total future reward directly (model-free RL), or via predictions of future states (model-based RL). Within this framework, "successor representation" (SR) predicts total future occupancy of all states. A recent theoretical proposal suggests that the hippocampus encodes the SR in order to facilitate prediction of future reward. However, this proposal does not take into account how learning should adapt under uncertainty and switches of context. Here, we introduce a theory of learning SRs using prediction errors which includes optimally balancing uncertainty in new observations versus existing knowledge. We then generalize that approach to a multicontext setting, allowing the model to learn and maintain multiple task-specific SRs and infer which one to use at any moment based on the accuracy of its predictions. Thus, the context used for predictions can be determined by both the contents of the states themselves and the distribution of transitions between them. This probabilistic SR model captures animal behavior in tasks which require contextual memory and generalization, and unifies previous SR theory with hippocampal-dependent contextual decision-making. (PsycInfo Database Record (c) 2023 APA, all rights reserved)

UCL Discovery

Prioritized memory access explains planning and hippocampal replay.

Author: A Johnson
A Johnson
AC Singer
AK Lee
AnbspC Singer
AS Gupta
AW Moore
BB Doll
BB Doll
BE Pfeiffer
BF Sadacca
CM Gillan
CS Lansink
CT Wu
D Dupret
D Shohamy
DJ Foster
F Cushman
GE Wimmer
HF Ólafsdóttir
I. Momennejad
J O’Neill
J Peng
JC Jackson
JL McClelland
John R. Anderson
K Diba
Kimberly L Stachenfeld
Marcelo G. Mattar
Mehdi Keramati
MF Carr
MJ Sharpe
MM Botvinick
MP Karlsson
Nathaniel D. Daw
ND Daw
ND Daw
ND Daw
P Dayan
QJ Huys
RE Ambrose
RS Sutton
RS Sutton
S Cheng
Samuel J. Gershman
SN Gomperts
SP Jadhav
TJ Davidson
W Schultz
Publication venue: Nat Neurosci
Publication date: 01/11/2018
Field of study

To make decisions, animals must evaluate candidate choices by accessing memories of relevant experiences. Yet little is known about which experiences are considered or ignored during deliberation, which ultimately governs choice. We propose a normative theory predicting which memories should be accessed at each moment to optimize future decisions. Using nonlocal 'replay' of spatial locations in hippocampus as a window into memory access, we simulate a spatial navigation task in which an agent accesses memories of locations sequentially, ordered by utility: how much extra reward would be earned due to better choices. This prioritization balances two desiderata: the need to evaluate imminent choices versus the gain from propagating newly encountered information to preceding locations. Our theory offers a simple explanation for numerous findings about place cells; unifies seemingly disparate proposed functions of replay including planning, learning, and consolidation; and posits a mechanism whose dysfunction may underlie pathologies like rumination and craving

Crossref

Apollo (Cambridge)

Compositional Sequence Generation in the Entorhinal–Hippocampal System

Author: Daniel C. McNamee
Kimberly L. Stachenfeld
Matthew M. Botvinick
Samuel J. Gershman
Publication venue: MDPI AG
Publication date: 01/12/2022
Field of study

Neurons in the medial entorhinal cortex exhibit multiple, periodically organized, firing fields which collectively appear to form an internal representation of space. Neuroimaging data suggest that this grid coding is also present in other cortical areas such as the prefrontal cortex, indicating that it may be a general principle of neural functionality in the brain. In a recent analysis through the lens of dynamical systems theory, we showed how grid coding can lead to the generation of a diversity of empirically observed sequential reactivations of hippocampal place cells corresponding to traversals of cognitive maps. Here, we extend this sequence generation model by describing how the synthesis of multiple dynamical systems can support compositional cognitive computations. To empirically validate the model, we simulate two experiments demonstrating compositionality in space or in time during sequence generation. Finally, we describe several neural network architectures supporting various types of compositionality based on grid coding and highlight connections to recent work in machine learning leveraging analogous techniques

Directory of Open Access Journals

PubMed Central

Recommended from our members

A probabilistic approach to discovering dynamic full-brain functional connectivity patterns

Author: Blei David M.
Hasson Uri
Manning Jeremy R.
Norman Kenneth A.
Ranganath Rajesh
Stachenfeld Kimberly
Willke Theodore L.
Zhu Xia
Publication venue: 'Elsevier BV'
Publication date: 01/02/2018
Field of study

Recent work indicates that the covariance structure of functional magnetic resonance imaging (fMRI) data – commonly described as functional connectivity – can change as a function of the participant’s cognitive state (for review see [32]). Here we present a technique, termed hierarchical topographic factor analysis (HTFA), for efficiently discovering full-brain networks in large multi-subject neuroimaging datasets. HTFA approximates each subject’s network by first re-representing each brain image in terms of the activations of a set of localized nodes, and then computing the covariance of the activation time series of these nodes. The number of nodes, along with their locations, sizes, and activations (over time) are learned from the data. Because the number of nodes is typically substantially smaller than the number of fMRI voxels, HTFA can be orders of magnitude more efficient than traditional voxel-based functional connectivity approaches. In one case study, we show that HTFA recovers the known connectivity patterns underlying a synthetic dataset. In a second case study, we illustrate how HTFA may be used to discover dynamic full-brain activity and connectivity patterns in real fMRI data, collected as participants listened to a story. In a third case study, we carried out a similar series of analyses on fMRI data collected as participants viewed an episode of a television show. In these latter case studies, we found that both the HTFA-derived activity and connectivity patterns may be used to reliably decode which moments in the story or show the participants were experiencing. Further, we found that these two classes of patterns contained partially non-overlapping information, such that classifiers trained on combinations of activity-based and dynamic connectivity-based features performed better than classifiers trained on activity or connectivity patterns alone

Princeton University Open Access Repository

The hippocampus as a predictive map

Author: A Alvernhe
A Johnson
A Solway
AC Schapiro
AD Ekstrom
AP Maurer
BA Strange
BJ Wiltgen
C Barry
D Derdikman
D Hassabis
DJ Foster
EC Tolman
F Carpenter
G Pezzulo
J Gläscher
J Krupic
J Krupic
J Shi
J Weng
JB Hales
JJ Ribas-Fernandes
KI Blum
Kimberly L Stachenfeld
L Deuker
Matthew M Botvinick
ME Hasselmo
MI Schlesiger
MM Garvert
MR Mehta
MS Fanselow
MW Howard
ND Daw
NJ Gustafson
P Dayan
R Mazumder
RL Buckner
RS Sutton
RU Muller
RU Muller
S Mahadevan
SA Hollup
Samuel J Gershman
SJ Gershman
T Bonnevie
T Hafting
T Hafting
W Schultz
WB Levy
WD Penny
WE Skaggs
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref