51 research outputs found
A neurally plausible model learns successor representations in partially observable environments
Animals need to devise strategies to maximize returns while interacting with
their environment based on incoming noisy sensory observations. Task-relevant
states, such as the agent's location within an environment or the presence of a
predator, are often not directly observable but must be inferred using
available sensory information. Successor representations (SR) have been
proposed as a middle-ground between model-based and model-free reinforcement
learning strategies, allowing for fast value computation and rapid adaptation
to changes in the reward function or goal locations. Indeed, recent studies
suggest that features of neural responses are consistent with the SR framework.
However, it is not clear how such representations might be learned and computed
in partially observed, noisy environments. Here, we introduce a neurally
plausible model using distributional successor features, which builds on the
distributed distributional code for the representation and computation of
uncertainty, and which allows for efficient value function computation in
partially observed environments via the successor representation. We show that
distributional successor features can support reinforcement learning in noisy
environments in which direct learning of successful policies is infeasible
The effect of synaptic weight initialization in feature-based successor representation learning
After discovering place cells, the idea of the hippocampal (HPC) function to
represent geometric spaces has been extended to predictions, imaginations, and
conceptual cognitive maps. Recent research arguing that the HPC represents a
predictive map; and it has shown that the HPC predicts visits to specific
locations. This predictive map theory is based on successor representation (SR)
from reinforcement learning. Feature-based SR (SF), which uses a neural network
as a function approximation to learn SR, seems more plausible neurobiological
model. However, it is not well known how different methods of weight (W)
initialization affect SF learning.
In this study, SF learners were exposed to simple maze environments to
analyze SF learning efficiency and W patterns pattern changes. Three kinds of W
initialization pattern were used: identity matrix, zero matrix, and small
random matrix. The SF learner initiated with random weight matrix showed better
performance than other three RL agents. We will discuss the neurobiological
meaning of SF weight matrix. Through this approach, this paper tried to
increase our understanding of intelligence from neuroscientific and artificial
intelligence perspective.Comment: 11 pages, 8 figures including 2 supplementary figure
Rapid learning of predictive maps with STDP and theta phase precession
The predictive map hypothesis is a promising candidate principle for hippocampal function. A favoured formalisation of this hypothesis, called the successor representation, proposes that each place cell encodes the expected state occupancy of its target location in the near future. This predictive framework is supported by behavioural as well as electrophysiological evidence and has desirable consequences for both the generalisability and efficiency of reinforcement learning algorithms. However, it is unclear how the successor representation might be learnt in the brain. Error-driven temporal difference learning, commonly used to learn successor representations in artificial agents, is not known to be implemented in hippocampal networks. Instead, we demonstrate that spike-timing dependent plasticity (STDP), a form of Hebbian learning, acting on temporally compressed trajectories known as 'theta sweeps', is sufficient to rapidly learn a close approximation to the successor representation. The model is biologically plausible - it uses spiking neurons modulated by theta-band oscillations, diffuse and overlapping place cell-like state representations, and experimentally matched parameters. We show how this model maps onto known aspects of hippocampal circuitry and explains substantial variance in the temporal difference successor matrix, consequently giving rise to place cells that demonstrate experimentally observed successor representation-related phenomena including backwards expansion on a 1D track and elongation near walls in 2D. Finally, our model provides insight into the observed topographical ordering of place field sizes along the dorsal-ventral axis by showing this is necessary to prevent the detrimental mixing of larger place fields, which encode longer timescale successor representations, with more fine-grained predictions of spatial location
Successor Feature Sets: Generalizing Successor Representations Across Policies
Successor-style representations have many advantages for reinforcement
learning: for example, they can help an agent generalize from past experience
to new goals, and they have been proposed as explanations of behavioral and
neural data from human and animal learners. They also form a natural bridge
between model-based and model-free RL methods: like the former they make
predictions about future experiences, and like the latter they allow efficient
prediction of total discounted rewards. However, successor-style
representations are not optimized to generalize across policies: typically, we
maintain a limited-length list of policies, and share information among them by
representation learning or GPI. Successor-style representations also typically
make no provision for gathering information or reasoning about latent
variables. To address these limitations, we bring together ideas from
predictive state representations, belief space value iteration, successor
features, and convex analysis: we develop a new, general successor-style
representation, together with a Bellman equation that connects multiple sources
of information within this representation, including different latent states,
policies, and reward functions. The new representation is highly expressive:
for example, it lets us efficiently read off an optimal policy for a new reward
function, or a policy that imitates a new demonstration. For this paper, we
focus on exact computation of the new representation in small, known
environments, since even this restricted setting offers plenty of interesting
questions. Our implementation does not scale to large, unknown environments --
nor would we expect it to, since it generalizes POMDP value iteration, which is
difficult to scale. However, we believe that future work will allow us to
extend our ideas to approximate reasoning in large, unknown environments
Learning neural codes for perceptual uncertainty
Perception is an inferential process, in which the state of the immediate environment must be estimated from sensory input. Inference in the face of noise and ambiguity requires reasoning with uncertainty, and much animal behaviour appears close to Bayes optimal. This observation has inspired hypotheses for how the activity of neurons in the brain might represent the distributional beliefs necessary to implement explicit Bayesian computation. While previous work has focused on the sufficiency of these hypothesised codes for computation, relatively little consideration has been given to optimality in the representation itself. Here, we adopt an encoder-decoder approach to study representational optimisation within one hypothesised belief encoding framework: the distributed distributional code (DDC). We consider a setting in which typical belief distribution functions take the form of a sparse combination of an underlying set of basis functions, and the corresponding DDC signals are corrupted by neural variability. We estimate the conditional entropy over beliefs induced by these DDC signals using an appropriate decoder. Like other hypothesised frameworks, a DDC representation of a belief depends on a set of fixed encoding functions that are usually set arbitrarily. Our approach allows us to seek the encoding functions that minimise the decoder conditional entropy and thus optimise representational accuracy in an information theoretic sense. We apply the approach to show how optimal encoding properties may adapt to represent beliefs in new environments, relating the results to experimentally reported neural responses
Probabilistic learning and computation in brains and machines
Humans and animals are able to solve a wide variety of perceptual, decision making and motor tasks with great exibility. Moreover, behavioural evidence shows that this exibility extends to situations where accuracy requires the correct treatment of uncertainty induced by noise and ambiguity in the available sensory information as well as noise internal to the brain. It has been suggested that this adequate handling of uncertainty is based on a learned internal model, e.g. in the case of perception, a generative model of sensory observations. Learning latent variable models and performing inference in them is a key challenge for both biological and arti cial learning systems. Here, we introduce a new approach to learning in hierarchical latent variable models called the Distributed Distributional Code Helmholtz Machine (DDC-HM), which emphasises exibility and accuracy in the inferential process. The approximate posterior over unobserved variables is represented implicitly as a set of expectations, corresponding to mean parameters of an exponential family distribution. To train the generative and recognition models we develop an extended wake-sleep algorithm inspired by the original Helmholtz Machine. As a result, the DDC-HM is able to learn hierarchical latent models without having to propagate gradients across di erent stochastic layers|making our approach biologically appealing. In the second part of the thesis, we review existing proposals for neural representations of uncertainty with a focus on representational and computational exibility as well as experimental support. Finally, we consider inference and learning in dynamical environment models using Distributed Distributional Codes to represent both the stochastic latent transition model and the inferred posterior distributions. We show that this model makes it possible to generalise successor representations to biologically more realistic, partially observed settings
Hippocampal predictive maps of an uncertain world
Humans and other animals can solve a wide variety of decision-making problems with remarkable flexibility. This flexibility is thought to derive from an internal model of the world, or âcognitive mapâ, used to predict the future and plan actions accordingly. A recent theoretical proposal suggests that the hippocampus houses a representation of long-run state expectancies. These âsuccessor representationsâ (SRs) occupy a middle ground between model-free and model-based reinforcement learning strategies. However, it is not clear whether SRs can explain hippocampal contributions to spatial and model-based behaviour, nor how a putative hippocampal SR might interface with striatal learning mechanisms. More generally, it is not clear how the predictive map should encode uncertainty, and how an uncertainty-augmented predictive map modifies our experimental predictions for animal behaviour. In the first part of this thesis, I investigated whether viewing the hippocampus as an SR can explain experiments contrasting hippocampal and dorsolateral striatal contributions to behaviour in spatial and non-spatial tasks. To do this, I modelled the hippocampus as an SR and DLS as model-free reinforcement learning, combining their outputs via their relative reliability as a proxy for uncertainty. Current SR models do not formally address uncertainty. Therefore I extended the learning of SRs by temporal differences to include managing uncertainty in new observations versus existing knowledge. I generalise this approach to a multi-task setting using a Bayesian nonparametric switching Kalman Filter, allowing the model to learn and maintain multiple task-specific SR maps and infer which one to use at any moment based on the observations. I show that this Bayesian SR model captures animal behaviour in tasks which require contextual memory and generalisation. In conclusion, I consider how the hippocampal contribution to behaviour can be considered as a predictive map when adapted to take account of uncertainty and combined with other behavioural controllers
An introduction to reinforcement learning for neuroscience
Reinforcement learning has a rich history in neuroscience, from early work on
dopamine as a reward prediction error signal for temporal difference learning
(Schultz et al., 1997) to recent work suggesting that dopamine could implement
a form of 'distributional reinforcement learning' popularized in deep learning
(Dabney et al., 2020). Throughout this literature, there has been a tight link
between theoretical advances in reinforcement learning and neuroscientific
experiments and findings. As a result, the theories describing our experimental
data have become increasingly complex and difficult to navigate. In this
review, we cover the basic theory underlying classical work in reinforcement
learning and build up to an introductory overview of methods used in modern
deep reinforcement learning that have found applications in systems
neuroscience. We start with an overview of the reinforcement learning problem
and classical temporal difference algorithms, followed by a discussion of
'model-free' and 'model-based' reinforcement learning together with methods
such as DYNA and successor representations that fall in between these two
categories. Throughout these sections, we highlight the close parallels between
the machine learning methods and related work in both experimental and
theoretical neuroscience. We then provide an introduction to deep reinforcement
learning with examples of how these methods have been used to model different
learning phenomena in the systems neuroscience literature, such as
meta-reinforcement learning (Wang et al., 2018) and distributional
reinforcement learning (Dabney et al., 2020). Code that implements the methods
discussed in this work and generates the figures is also provided.Comment: Code available at:
https://colab.research.google.com/drive/1kWOz2Uxn0cf2c4YizqIXQKWyxeYd6wvL?usp=sharin
- âŠ