38 research outputs found

    Assessment of a regional physical-biogeochemical stochastic ocean model. Part 2: Empirical consistency

    Get PDF
    This is the author accepted manuscript. The final version is available from Elsevier via the DOI in this recordIn this Part 2 article of a two-part series, observations based on satellite missions were used to evaluate the empirical consistency of model ensembles generated via stochastic modelling of ocean physics and biogeochemistry. A high-resolution Bay of Biscay configuration was used as a case study to explore the model error subspace in both the open and coastal ocean. In Part 1 of this work, three experiments were carried out to generate model ensembles by perturbing only physics, only biogeochemistry, and both of them simultaneously. In Part 2 of this work, empirical consistency was checked, first by means of rank histograms projecting the data onto the model ensemble classes, and second, by pattern-selective consistency criteria in the space of “array modes” defined as eigenvectors of the representer matrix. Rank histograms showed large dependency on geographical region and on season for sea surface temperature (SST), sea-level anomaly (SLA), and phytoplankton functional types (PFT), shifting from consistent model-data configurations to large biases because of model ensemble underspread. Consistency for SST array modes was found to be verified at large, small and coastal scales soon after the ensemble spin-up. Array modes for the along-track sea-level showed useful consistent information at large scales and at the mesoscale; for the gridded SLA was verified only at large scale. Array modes showed that biogeochemical model uncertainties generated by stochastic physics, were effectively detected by PFT measurements at large scales, as well as at mesoscale and small-scale. By contrast, perturbing only biogeochemistry, with an identical physical forcing across the ensemble, limits the potential of PFT measurements at detecting and possibly correcting small-scale biogeochemical model errors. When an ensemble was found to be inconsistent with observations along a particular direction (here, an array mode), a plausible reason is that other error processes must have been active in the model, in addition to the ones at work across the ensemble.Centre National de la Recherche Scientifique (CNRS)European Unio

    An Imperfect Dopaminergic Error Signal Can Drive Temporal-Difference Learning

    Get PDF
    An open problem in the field of computational neuroscience is how to link synaptic plasticity to system-level learning. A promising framework in this context is temporal-difference (TD) learning. Experimental evidence that supports the hypothesis that the mammalian brain performs temporal-difference learning includes the resemblance of the phasic activity of the midbrain dopaminergic neurons to the TD error and the discovery that cortico-striatal synaptic plasticity is modulated by dopamine. However, as the phasic dopaminergic signal does not reproduce all the properties of the theoretical TD error, it is unclear whether it is capable of driving behavior adaptation in complex tasks. Here, we present a spiking temporal-difference learning model based on the actor-critic architecture. The model dynamically generates a dopaminergic signal with realistic firing rates and exploits this signal to modulate the plasticity of synapses as a third factor. The predictions of our proposed plasticity dynamics are in good agreement with experimental results with respect to dopamine, pre- and post-synaptic activity. An analytical mapping from the parameters of our proposed plasticity dynamics to those of the classical discrete-time TD algorithm reveals that the biological constraints of the dopaminergic signal entail a modified TD algorithm with self-adapting learning parameters and an adapting offset. We show that the neuronal network is able to learn a task with sparse positive rewards as fast as the corresponding classical discrete-time TD algorithm. However, the performance of the neuronal network is impaired with respect to the traditional algorithm on a task with both positive and negative rewards and breaks down entirely on a task with purely negative rewards. Our model demonstrates that the asymmetry of a realistic dopaminergic signal enables TD learning when learning is driven by positive rewards but not when driven by negative rewards

    Spike-Based Reinforcement Learning in Continuous State and Action Space: When Policy Gradient Methods Fail

    Get PDF
    Changes of synaptic connections between neurons are thought to be the physiological basis of learning. These changes can be gated by neuromodulators that encode the presence of reward. We study a family of reward-modulated synaptic learning rules for spiking neurons on a learning task in continuous space inspired by the Morris Water maze. The synaptic update rule modifies the release probability of synaptic transmission and depends on the timing of presynaptic spike arrival, postsynaptic action potentials, as well as the membrane potential of the postsynaptic neuron. The family of learning rules includes an optimal rule derived from policy gradient methods as well as reward modulated Hebbian learning. The synaptic update rule is implemented in a population of spiking neurons using a network architecture that combines feedforward input with lateral connections. Actions are represented by a population of hypothetical action cells with strong mexican-hat connectivity and are read out at theta frequency. We show that in this architecture, a standard policy gradient rule fails to solve the Morris watermaze task, whereas a variant with a Hebbian bias can learn the task within 20 trials, consistent with experiments. This result does not depend on implementation details such as the size of the neuronal populations. Our theoretical approach shows how learning new behaviors can be linked to reward-modulated plasticity at the level of single synapses and makes predictions about the voltage and spike-timing dependence of synaptic plasticity and the influence of neuromodulators such as dopamine. It is an important step towards connecting formal theories of reinforcement learning with neuronal and synaptic properties

    Motor primitives in space and time via targeted gain modulation in cortical networks

    Get PDF
    Motor cortex (M1) exhibits a rich repertoire of activities to support the generation of complex movements. Recent network models capture many qualitative aspects of M1 dynamics, but they can generate only a few distinct movements (all of the same duration). We demonstrate that simple modulation of neuronal input–output gains in recurrent neuronal network models with fixed connectivity can dramatically reorganize neuronal activity and consequently downstream muscle outputs. We show that a relatively small number of modulatory control units provide sufficient flexibility to adjust high-dimensional network activity using a simple reward-based learning rule. Furthermore, novel movements can be assembled from previously-learned primitives and we can separately change movement speed while preserving movement shape. Our results provide a new perspective on the role of modulatory systems in controlling recurrent cortical activity.Our work was supported by grants from the Wellcome Trust (TPV and JPS WT100000, 246 GH 202111/Z/16/Z) and the Engineering and Physical Sciences Research Council (JPS)

    Reinforcement learning using a continuous time actor-critic framework with spiking neurons.

    No full text
    Animals repeat rewarded behaviors, but the physiological basis of reward-based learning has only been partially elucidated. On one hand, experimental evidence shows that the neuromodulator dopamine carries information about rewards and affects synaptic plasticity. On the other hand, the theory of reinforcement learning provides a framework for reward-based learning. Recent models of reward-modulated spike-timing-dependent plasticity have made first steps towards bridging the gap between the two approaches, but faced two problems. First, reinforcement learning is typically formulated in a discrete framework, ill-adapted to the description of natural situations. Second, biologically plausible models of reward-modulated spike-timing-dependent plasticity require precise calculation of the reward prediction error, yet it remains to be shown how this can be computed by neurons. Here we propose a solution to these problems by extending the continuous temporal difference (TD) learning of Doya (2000) to the case of spiking neurons in an actor-critic network operating in continuous time, and with continuous state and action representations. In our model, the critic learns to predict expected future rewards in real time. Its activity, together with actual rewards, conditions the delivery of a neuromodulatory TD signal to itself and to the actor, which is responsible for action choice. In simulations, we show that such an architecture can solve a Morris water-maze-like navigation task, in a number of trials consistent with reported animal performance. We also use our model to solve the acrobot and the cartpole problems, two complex motor control tasks. Our model provides a plausible way of computing reward prediction error in the brain. Moreover, the analytically derived learning rule is consistent with experimental evidence for dopamine-modulated spike-timing-dependent plasticity

    Functional requirements for reward-modulated spike-timing-dependent plasticity.

    No full text
    Recent experiments have shown that spike-timing-dependent plasticity is influenced by neuromodulation. We derive theoretical conditions for successful learning of reward-related behavior for a large class of learning rules where Hebbian synaptic plasticity is conditioned on a global modulatory factor signaling reward. We show that all learning rules in this class can be separated into a term that captures the covariance of neuronal firing and reward and a second term that presents the influence of unsupervised learning. The unsupervised term, which is, in general, detrimental for reward-based learning, can be suppressed if the neuromodulatory signal encodes the difference between the reward and the expected reward-but only if the expected reward is calculated for each task and stimulus separately. If several tasks are to be learned simultaneously, the nervous system needs an internal critic that is able to predict the expected reward for arbitrary stimuli. We show that, with a critic, reward-modulated spike-timing-dependent plasticity is capable of learning motor trajectories with a temporal resolution of tens of milliseconds. The relation to temporal difference learning, the relevance of block-based learning paradigms, and the limitations of learning with a critic are discussed

    Perceptual learning, roving and the unsupervised bias

    Get PDF
    Perceptual learning improves perception through training. Perceptual learning improves with most stimulus types but fails when . certain stimulus types are mixed during training (roving). This result is surprising because classical supervised and unsupervised neural network models can cope easily with roving conditions. What makes humans so inferior compared to these models? As experimental and conceptual work has shown, human perceptual learning is neither supervised nor unsupervised but reward-based learning. Reward-based learning suffers from the so-called unsupervised bias, i.e., to prevent synaptic " drift" , the . average reward has to be exactly estimated. However, this is impossible when two or more stimulus types with different rewards are presented during training (and the reward is estimated by a running average). For this reason, we propose no learning occurs in roving conditions. However, roving hinders perceptual learning only for combinations of similar stimulus types but not for dissimilar ones. In this latter case, we propose that a critic can estimate the reward for each stimulus type separately. One implication of our analysis is that the critic cannot be located in the visual system. © 2011 Elsevier Ltd

    Ensemble downscaling of a regional ocean model

    No full text
    We downscaled a free ensemble of a regional, parent model to a high-resolution coastal, child ensemble in the Bay of Biscay. The child ensemble was forced at the open boundaries by the parent ensemble, and locally by perturbing the winds. By comparing ensembles generated by each of these forcing perturbations separately and combined we were able to consider the ensemble from either of two paradigms: (1) characterising high-resolution, coastal model errors using local and non-local forcing perturbations, or (2) downscaling regional model errors into the coastal domain. We found that most of the spread in the child ensembles was generated from the ensemble of open boundary conditions, with the local wind perturbations on their own generating substantially less ensemble spread. Together, the two sources of error increased the ensemble spread by only a small amount over the non-local perturbations alone. In general, the spread in sea surface height was greater in the child ensembles than in the parent ensemble, probably due to the more refined dynamics, while the spread in sea surface temperature was lower, likely due to the way the open boundary conditions were averaged. Deep below the surface, though, the child ensemble featured a large spread even where the parent model's spread was very weak. This enhanced error response is a promising result for an ensemble data assimilation system, as it could be exploited to correct the model deep below the surface. © 2019 Elsevier Lt

    Assessment of a regional physical–biogeochemical stochastic ocean model. Part 2: Empirical consistency

    No full text
    In this Part 2 article of a two-part series, observations based on satellite missions were used to evaluate the empirical consistency of model ensembles generated via stochastic modelling of ocean physics and biogeochemistry. A high-resolution Bay of Biscay configuration was used as a case study to explore the model error subspace in both the open and coastal ocean. In Part 1 of this work, three experiments were carried out to generate model ensembles by perturbing only physics, only biogeochemistry, and both of them simultaneously. In Part 2 of this work, empirical consistency was checked, first by means of rank histograms projecting the data onto the model ensemble classes, and second, by pattern-selective consistency criteria in the space of “array modes” defined as eigenvectors of the representer matrix. Rank histograms showed large dependency on geographical region and on season for sea surface temperature (SST), sea-level anomaly (SLA), and phytoplankton functional types (PFT), shifting from consistent model-data configurations to large biases because of model ensemble underspread. Consistency for SST array modes was found to be verified at large, small and coastal scales soon after the ensemble spin-up. Array modes for the along-track sea-level showed useful consistent information at large scales and at the mesoscale; for the gridded SLA was verified only at large scale. Array modes showed that biogeochemical model uncertainties generated by stochastic physics, were effectively detected by PFT measurements at large scales, as well as at mesoscale and small-scale. By contrast, perturbing only biogeochemistry, with an identical physical forcing across the ensemble, limits the potential of PFT measurements at detecting and possibly correcting small-scale biogeochemical model errors. When an ensemble was found to be inconsistent with observations along a particular direction (here, an array mode), a plausible reason is that other error processes must have been active in the model, in addition to the ones at work across the ensemble. © 2021 Elsevier Lt
    corecore