13 research outputs found
Predictive auxiliary objectives in deep RL mimic learning in the brain
The ability to predict upcoming events has been hypothesized to comprise a
key aspect of natural and machine cognition. This is supported by trends in
deep reinforcement learning (RL), where self-supervised auxiliary objectives
such as prediction are widely used to support representation learning and
improve task performance. Here, we study the effects predictive auxiliary
objectives have on representation learning across different modules of an RL
system and how these mimic representational changes observed in the brain. We
find that predictive objectives improve and stabilize learning particularly in
resource-limited architectures, and we identify settings where longer
predictive horizons better support representational transfer. Furthermore, we
find that representational changes in this RL system bear a striking
resemblance to changes in neural activity observed in the brain across various
experiments. Specifically, we draw a connection between the auxiliary
predictive model of the RL system and hippocampus, an area thought to learn a
predictive model to support memory-guided behavior. We also connect the encoder
network and the value learning network of the RL system to visual cortex and
striatum in the brain, respectively. This work demonstrates how representation
learning in deep RL systems can provide an interpretable framework for modeling
multi-region interactions in the brain. The deep RL perspective taken here also
suggests an additional role of the hippocampus in the brain -- that of an
auxiliary learning system that benefits representation learning in other
regions
Probabilistic Successor Representations with Kalman Temporal Differences
The effectiveness of Reinforcement Learning (RL) depends on an animal's
ability to assign credit for rewards to the appropriate preceding stimuli. One
aspect of understanding the neural underpinnings of this process involves
understanding what sorts of stimulus representations support generalisation.
The Successor Representation (SR), which enforces generalisation over states
that predict similar outcomes, has become an increasingly popular model in this
space of inquiries. Another dimension of credit assignment involves
understanding how animals handle uncertainty about learned associations, using
probabilistic methods such as Kalman Temporal Differences (KTD). Combining
these approaches, we propose using KTD to estimate a distribution over the SR.
KTD-SR captures uncertainty about the estimated SR as well as covariances
between different long-term predictions. We show that because of this, KTD-SR
exhibits partial transition revaluation as humans do in this experiment without
additional replay, unlike the standard TD-SR algorithm. We conclude by
discussing future applications of the KTD-SR as a model of the interaction
between predictive and probabilistic animal reasoning.Comment: Conference on Cognitive Computational Neuroscienc
Rapid learning of predictive maps with STDP and theta phase precession
The predictive map hypothesis is a promising candidate principle for hippocampal function. A favoured formalisation of this hypothesis, called the successor representation, proposes that each place cell encodes the expected state occupancy of its target location in the near future. This predictive framework is supported by behavioural as well as electrophysiological evidence and has desirable consequences for both the generalisability and efficiency of reinforcement learning algorithms. However, it is unclear how the successor representation might be learnt in the brain. Error-driven temporal difference learning, commonly used to learn successor representations in artificial agents, is not known to be implemented in hippocampal networks. Instead, we demonstrate that spike-timing dependent plasticity (STDP), a form of Hebbian learning, acting on temporally compressed trajectories known as 'theta sweeps', is sufficient to rapidly learn a close approximation to the successor representation. The model is biologically plausible - it uses spiking neurons modulated by theta-band oscillations, diffuse and overlapping place cell-like state representations, and experimentally matched parameters. We show how this model maps onto known aspects of hippocampal circuitry and explains substantial variance in the temporal difference successor matrix, consequently giving rise to place cells that demonstrate experimentally observed successor representation-related phenomena including backwards expansion on a 1D track and elongation near walls in 2D. Finally, our model provides insight into the observed topographical ordering of place field sizes along the dorsal-ventral axis by showing this is necessary to prevent the detrimental mixing of larger place fields, which encode longer timescale successor representations, with more fine-grained predictions of spatial location
A probabilistic successor representation for context-dependent learning
Two of the main impediments to learning complex tasks are that relationships between different stimuli, including rewards, can be uncertain and context-dependent. Reinforcement learning (RL) provides a framework for learning, by predicting total future reward directly (model-free RL), or via predictions of future states (model-based RL). Within this framework, "successor representation" (SR) predicts total future occupancy of all states. A recent theoretical proposal suggests that the hippocampus encodes the SR in order to facilitate prediction of future reward. However, this proposal does not take into account how learning should adapt under uncertainty and switches of context. Here, we introduce a theory of learning SRs using prediction errors which includes optimally balancing uncertainty in new observations versus existing knowledge. We then generalize that approach to a multicontext setting, allowing the model to learn and maintain multiple task-specific SRs and infer which one to use at any moment based on the accuracy of its predictions. Thus, the context used for predictions can be determined by both the contents of the states themselves and the distribution of transitions between them. This probabilistic SR model captures animal behavior in tasks which require contextual memory and generalization, and unifies previous SR theory with hippocampal-dependent contextual decision-making. (PsycInfo Database Record (c) 2023 APA, all rights reserved)
Prioritized memory access explains planning and hippocampal replay.
To make decisions, animals must evaluate candidate choices by accessing memories of relevant experiences. Yet little is known about which experiences are considered or ignored during deliberation, which ultimately governs choice. We propose a normative theory predicting which memories should be accessed at each moment to optimize future decisions. Using nonlocal 'replay' of spatial locations in hippocampus as a window into memory access, we simulate a spatial navigation task in which an agent accesses memories of locations sequentially, ordered by utility: how much extra reward would be earned due to better choices. This prioritization balances two desiderata: the need to evaluate imminent choices versus the gain from propagating newly encountered information to preceding locations. Our theory offers a simple explanation for numerous findings about place cells; unifies seemingly disparate proposed functions of replay including planning, learning, and consolidation; and posits a mechanism whose dysfunction may underlie pathologies like rumination and craving
Compositional Sequence Generation in the Entorhinal–Hippocampal System
Neurons in the medial entorhinal cortex exhibit multiple, periodically organized, firing fields which collectively appear to form an internal representation of space. Neuroimaging data suggest that this grid coding is also present in other cortical areas such as the prefrontal cortex, indicating that it may be a general principle of neural functionality in the brain. In a recent analysis through the lens of dynamical systems theory, we showed how grid coding can lead to the generation of a diversity of empirically observed sequential reactivations of hippocampal place cells corresponding to traversals of cognitive maps. Here, we extend this sequence generation model by describing how the synthesis of multiple dynamical systems can support compositional cognitive computations. To empirically validate the model, we simulate two experiments demonstrating compositionality in space or in time during sequence generation. Finally, we describe several neural network architectures supporting various types of compositionality based on grid coding and highlight connections to recent work in machine learning leveraging analogous techniques
Recommended from our members
A probabilistic approach to discovering dynamic full-brain functional connectivity patterns
Recent work indicates that the covariance structure of functional magnetic resonance imaging (fMRI) data – commonly described as functional connectivity – can change as a function of the participant’s cognitive state (for review see [32]). Here we present a technique, termed hierarchical topographic factor analysis (HTFA), for efficiently discovering full-brain networks in large multi-subject neuroimaging datasets. HTFA approximates each
subject’s network by first re-representing each brain image in terms of the activations of a set of localized nodes, and then computing the covariance of the activation time series of these nodes. The number of nodes, along with their locations, sizes, and activations (over time) are learned from the data. Because the number of nodes is typically substantially smaller than the number of fMRI voxels, HTFA can be orders of magnitude more efficient
than traditional voxel-based functional connectivity approaches. In one case study, we show that HTFA recovers the known connectivity patterns underlying a synthetic dataset. In a second case study, we illustrate how HTFA may be used to discover dynamic full-brain activity and connectivity patterns in real fMRI data, collected as participants listened to a story. In a third case study, we carried out a similar series of analyses on fMRI data collected as participants viewed an episode of a television show. In these latter case studies, we found
that both the HTFA-derived activity and connectivity patterns may be used to reliably decode which moments in the story or show the participants were experiencing. Further, we found that these two classes of patterns contained partially non-overlapping information, such that classifiers trained on combinations of activity-based and dynamic connectivity-based features performed better than classifiers trained on activity or connectivity patterns
alone