51 research outputs found

    A neurally plausible model learns successor representations in partially observable environments

    Get PDF
    Animals need to devise strategies to maximize returns while interacting with their environment based on incoming noisy sensory observations. Task-relevant states, such as the agent's location within an environment or the presence of a predator, are often not directly observable but must be inferred using available sensory information. Successor representations (SR) have been proposed as a middle-ground between model-based and model-free reinforcement learning strategies, allowing for fast value computation and rapid adaptation to changes in the reward function or goal locations. Indeed, recent studies suggest that features of neural responses are consistent with the SR framework. However, it is not clear how such representations might be learned and computed in partially observed, noisy environments. Here, we introduce a neurally plausible model using distributional successor features, which builds on the distributed distributional code for the representation and computation of uncertainty, and which allows for efficient value function computation in partially observed environments via the successor representation. We show that distributional successor features can support reinforcement learning in noisy environments in which direct learning of successful policies is infeasible

    The effect of synaptic weight initialization in feature-based successor representation learning

    Full text link
    After discovering place cells, the idea of the hippocampal (HPC) function to represent geometric spaces has been extended to predictions, imaginations, and conceptual cognitive maps. Recent research arguing that the HPC represents a predictive map; and it has shown that the HPC predicts visits to specific locations. This predictive map theory is based on successor representation (SR) from reinforcement learning. Feature-based SR (SF), which uses a neural network as a function approximation to learn SR, seems more plausible neurobiological model. However, it is not well known how different methods of weight (W) initialization affect SF learning. In this study, SF learners were exposed to simple maze environments to analyze SF learning efficiency and W patterns pattern changes. Three kinds of W initialization pattern were used: identity matrix, zero matrix, and small random matrix. The SF learner initiated with random weight matrix showed better performance than other three RL agents. We will discuss the neurobiological meaning of SF weight matrix. Through this approach, this paper tried to increase our understanding of intelligence from neuroscientific and artificial intelligence perspective.Comment: 11 pages, 8 figures including 2 supplementary figure

    Rapid learning of predictive maps with STDP and theta phase precession

    Get PDF
    The predictive map hypothesis is a promising candidate principle for hippocampal function. A favoured formalisation of this hypothesis, called the successor representation, proposes that each place cell encodes the expected state occupancy of its target location in the near future. This predictive framework is supported by behavioural as well as electrophysiological evidence and has desirable consequences for both the generalisability and efficiency of reinforcement learning algorithms. However, it is unclear how the successor representation might be learnt in the brain. Error-driven temporal difference learning, commonly used to learn successor representations in artificial agents, is not known to be implemented in hippocampal networks. Instead, we demonstrate that spike-timing dependent plasticity (STDP), a form of Hebbian learning, acting on temporally compressed trajectories known as 'theta sweeps', is sufficient to rapidly learn a close approximation to the successor representation. The model is biologically plausible - it uses spiking neurons modulated by theta-band oscillations, diffuse and overlapping place cell-like state representations, and experimentally matched parameters. We show how this model maps onto known aspects of hippocampal circuitry and explains substantial variance in the temporal difference successor matrix, consequently giving rise to place cells that demonstrate experimentally observed successor representation-related phenomena including backwards expansion on a 1D track and elongation near walls in 2D. Finally, our model provides insight into the observed topographical ordering of place field sizes along the dorsal-ventral axis by showing this is necessary to prevent the detrimental mixing of larger place fields, which encode longer timescale successor representations, with more fine-grained predictions of spatial location

    Successor Feature Sets: Generalizing Successor Representations Across Policies

    Full text link
    Successor-style representations have many advantages for reinforcement learning: for example, they can help an agent generalize from past experience to new goals, and they have been proposed as explanations of behavioral and neural data from human and animal learners. They also form a natural bridge between model-based and model-free RL methods: like the former they make predictions about future experiences, and like the latter they allow efficient prediction of total discounted rewards. However, successor-style representations are not optimized to generalize across policies: typically, we maintain a limited-length list of policies, and share information among them by representation learning or GPI. Successor-style representations also typically make no provision for gathering information or reasoning about latent variables. To address these limitations, we bring together ideas from predictive state representations, belief space value iteration, successor features, and convex analysis: we develop a new, general successor-style representation, together with a Bellman equation that connects multiple sources of information within this representation, including different latent states, policies, and reward functions. The new representation is highly expressive: for example, it lets us efficiently read off an optimal policy for a new reward function, or a policy that imitates a new demonstration. For this paper, we focus on exact computation of the new representation in small, known environments, since even this restricted setting offers plenty of interesting questions. Our implementation does not scale to large, unknown environments -- nor would we expect it to, since it generalizes POMDP value iteration, which is difficult to scale. However, we believe that future work will allow us to extend our ideas to approximate reasoning in large, unknown environments

    Learning neural codes for perceptual uncertainty

    Get PDF
    Perception is an inferential process, in which the state of the immediate environment must be estimated from sensory input. Inference in the face of noise and ambiguity requires reasoning with uncertainty, and much animal behaviour appears close to Bayes optimal. This observation has inspired hypotheses for how the activity of neurons in the brain might represent the distributional beliefs necessary to implement explicit Bayesian computation. While previous work has focused on the sufficiency of these hypothesised codes for computation, relatively little consideration has been given to optimality in the representation itself. Here, we adopt an encoder-decoder approach to study representational optimisation within one hypothesised belief encoding framework: the distributed distributional code (DDC). We consider a setting in which typical belief distribution functions take the form of a sparse combination of an underlying set of basis functions, and the corresponding DDC signals are corrupted by neural variability. We estimate the conditional entropy over beliefs induced by these DDC signals using an appropriate decoder. Like other hypothesised frameworks, a DDC representation of a belief depends on a set of fixed encoding functions that are usually set arbitrarily. Our approach allows us to seek the encoding functions that minimise the decoder conditional entropy and thus optimise representational accuracy in an information theoretic sense. We apply the approach to show how optimal encoding properties may adapt to represent beliefs in new environments, relating the results to experimentally reported neural responses

    Replay in minds and machines

    Get PDF

    Probabilistic learning and computation in brains and machines

    Get PDF
    Humans and animals are able to solve a wide variety of perceptual, decision making and motor tasks with great exibility. Moreover, behavioural evidence shows that this exibility extends to situations where accuracy requires the correct treatment of uncertainty induced by noise and ambiguity in the available sensory information as well as noise internal to the brain. It has been suggested that this adequate handling of uncertainty is based on a learned internal model, e.g. in the case of perception, a generative model of sensory observations. Learning latent variable models and performing inference in them is a key challenge for both biological and arti cial learning systems. Here, we introduce a new approach to learning in hierarchical latent variable models called the Distributed Distributional Code Helmholtz Machine (DDC-HM), which emphasises exibility and accuracy in the inferential process. The approximate posterior over unobserved variables is represented implicitly as a set of expectations, corresponding to mean parameters of an exponential family distribution. To train the generative and recognition models we develop an extended wake-sleep algorithm inspired by the original Helmholtz Machine. As a result, the DDC-HM is able to learn hierarchical latent models without having to propagate gradients across di erent stochastic layers|making our approach biologically appealing. In the second part of the thesis, we review existing proposals for neural representations of uncertainty with a focus on representational and computational exibility as well as experimental support. Finally, we consider inference and learning in dynamical environment models using Distributed Distributional Codes to represent both the stochastic latent transition model and the inferred posterior distributions. We show that this model makes it possible to generalise successor representations to biologically more realistic, partially observed settings

    Hippocampal predictive maps of an uncertain world

    Get PDF
    Humans and other animals can solve a wide variety of decision-making problems with remarkable flexibility. This flexibility is thought to derive from an internal model of the world, or ‘cognitive map’, used to predict the future and plan actions accordingly. A recent theoretical proposal suggests that the hippocampus houses a representation of long-run state expectancies. These “successor representations” (SRs) occupy a middle ground between model-free and model-based reinforcement learning strategies. However, it is not clear whether SRs can explain hippocampal contributions to spatial and model-based behaviour, nor how a putative hippocampal SR might interface with striatal learning mechanisms. More generally, it is not clear how the predictive map should encode uncertainty, and how an uncertainty-augmented predictive map modifies our experimental predictions for animal behaviour. In the first part of this thesis, I investigated whether viewing the hippocampus as an SR can explain experiments contrasting hippocampal and dorsolateral striatal contributions to behaviour in spatial and non-spatial tasks. To do this, I modelled the hippocampus as an SR and DLS as model-free reinforcement learning, combining their outputs via their relative reliability as a proxy for uncertainty. Current SR models do not formally address uncertainty. Therefore I extended the learning of SRs by temporal differences to include managing uncertainty in new observations versus existing knowledge. I generalise this approach to a multi-task setting using a Bayesian nonparametric switching Kalman Filter, allowing the model to learn and maintain multiple task-specific SR maps and infer which one to use at any moment based on the observations. I show that this Bayesian SR model captures animal behaviour in tasks which require contextual memory and generalisation. In conclusion, I consider how the hippocampal contribution to behaviour can be considered as a predictive map when adapted to take account of uncertainty and combined with other behavioural controllers

    An introduction to reinforcement learning for neuroscience

    Full text link
    Reinforcement learning has a rich history in neuroscience, from early work on dopamine as a reward prediction error signal for temporal difference learning (Schultz et al., 1997) to recent work suggesting that dopamine could implement a form of 'distributional reinforcement learning' popularized in deep learning (Dabney et al., 2020). Throughout this literature, there has been a tight link between theoretical advances in reinforcement learning and neuroscientific experiments and findings. As a result, the theories describing our experimental data have become increasingly complex and difficult to navigate. In this review, we cover the basic theory underlying classical work in reinforcement learning and build up to an introductory overview of methods used in modern deep reinforcement learning that have found applications in systems neuroscience. We start with an overview of the reinforcement learning problem and classical temporal difference algorithms, followed by a discussion of 'model-free' and 'model-based' reinforcement learning together with methods such as DYNA and successor representations that fall in between these two categories. Throughout these sections, we highlight the close parallels between the machine learning methods and related work in both experimental and theoretical neuroscience. We then provide an introduction to deep reinforcement learning with examples of how these methods have been used to model different learning phenomena in the systems neuroscience literature, such as meta-reinforcement learning (Wang et al., 2018) and distributional reinforcement learning (Dabney et al., 2020). Code that implements the methods discussed in this work and generates the figures is also provided.Comment: Code available at: https://colab.research.google.com/drive/1kWOz2Uxn0cf2c4YizqIXQKWyxeYd6wvL?usp=sharin
    • 

    corecore