18 research outputs found

    Mathematical properties of neuronal TD-rules and differential Hebbian learning: a comparison

    Get PDF
    A confusingly wide variety of temporally asymmetric learning rules exists related to reinforcement learning and/or to spike-timing dependent plasticity, many of which look exceedingly similar, while displaying strongly different behavior. These rules often find their use in control tasks, for example in robotics and for this rigorous convergence and numerical stability is required. The goal of this article is to review these rules and compare them to provide a better overview over their different properties. Two main classes will be discussed: temporal difference (TD) rules and correlation based (differential hebbian) rules and some transition cases. In general we will focus on neuronal implementations with changeable synaptic weights and a time-continuous representation of activity. In a machine learning (non-neuronal) context, for TD-learning a solid mathematical theory has existed since several years. This can partly be transfered to a neuronal framework, too. On the other hand, only now a more complete theory has also emerged for differential Hebb rules. In general rules differ by their convergence conditions and their numerical stability, which can lead to very undesirable behavior, when wanting to apply them. For TD, convergence can be enforced with a certain output condition assuring that the δ-error drops on average to zero (output control). Correlation based rules, on the other hand, converge when one input drops to zero (input control). Temporally asymmetric learning rules treat situations where incoming stimuli follow each other in time. Thus, it is necessary to remember the first stimulus to be able to relate it to the later occurring second one. To this end different types of so-called eligibility traces are being used by these two different types of rules. This aspect leads again to different properties of TD and differential Hebbian learning as discussed here. Thus, this paper, while also presenting several novel mathematical results, is mainly meant to provide a road map through the different neuronally emulated temporal asymmetrical learning rules and their behavior to provide some guidance for possible applications

    The Effects of NMDA Subunit Composition on Calcium Influx and Spike Timing-Dependent Plasticity in Striatal Medium Spiny Neurons

    Get PDF
    Calcium through NMDA receptors (NMDARs) is necessary for the long-term potentiation (LTP) of synaptic strength; however, NMDARs differ in several properties that can influence the amount of calcium influx into the spine. These properties, such as sensitivity to magnesium block and conductance decay kinetics, change the receptor's response to spike timing dependent plasticity (STDP) protocols, and thereby shape synaptic integration and information processing. This study investigates the role of GluN2 subunit differences on spine calcium concentration during several STDP protocols in a model of a striatal medium spiny projection neuron (MSPN). The multi-compartment, multi-channel model exhibits firing frequency, spike width, and latency to first spike similar to current clamp data from mouse dorsal striatum MSPN. We find that NMDAR-mediated calcium is dependent on GluN2 subunit type, action potential timing, duration of somatic depolarization, and number of action potentials. Furthermore, the model demonstrates that in MSPNs, GluN2A and GluN2B control which STDP intervals allow for substantial calcium elevation in spines. The model predicts that blocking GluN2B subunits would modulate the range of intervals that cause long term potentiation. We confirmed this prediction experimentally, demonstrating that blocking GluN2B in the striatum, narrows the range of STDP intervals that cause long term potentiation. This ability of the GluN2 subunit to modulate the shape of the STDP curve could underlie the role that GluN2 subunits play in learning and development

    Phenomenological models of synaptic plasticity based on spike timing

    Get PDF
    Synaptic plasticity is considered to be the biological substrate of learning and memory. In this document we review phenomenological models of short-term and long-term synaptic plasticity, in particular spike-timing dependent plasticity (STDP). The aim of the document is to provide a framework for classifying and evaluating different models of plasticity. We focus on phenomenological synaptic models that are compatible with integrate-and-fire type neuron models where each neuron is described by a small number of variables. This implies that synaptic update rules for short-term or long-term plasticity can only depend on spike timing and, potentially, on membrane potential, as well as on the value of the synaptic weight, or on low-pass filtered (temporally averaged) versions of the above variables. We examine the ability of the models to account for experimental data and to fulfill expectations derived from theoretical considerations. We further discuss their relations to teacher-based rules (supervised learning) and reward-based rules (reinforcement learning). All models discussed in this paper are suitable for large-scale network simulations

    Modelling human choices: MADeM and decision‑making

    Get PDF
    Research supported by FAPESP 2015/50122-0 and DFG-GRTK 1740/2. RP and AR are also part of the Research, Innovation and Dissemination Center for Neuromathematics FAPESP grant (2013/07699-0). RP is supported by a FAPESP scholarship (2013/25667-8). ACR is partially supported by a CNPq fellowship (grant 306251/2014-0)

    Vehicle’s Steering Signal Predictions Using Neural Networks

    No full text
    Back-propagation trained neural networks, as well as extreme learning machine (ELM) were used to predict car driverpsilas steering behavior, based on road curvature, velocity and acceleration of a car. Predictions were performed using real-road data, obtained on a test car in a country-road scenario. We made a simplification using gyroscopically measured curvature of the road instead of visually extracted curvature measures. It was found that an optimum exists how far one has to look onto a curvature signal, according to neural network prediction accuracy. Velocity and acceleration did not improve steering signal prediction accuracy in our framework. Traditional neural networks and ELM performed similarly in terms of prediction errors

    Behavioral analysis of differential hebbian learning in closed-loop systems

    Get PDF
    Understanding closed loop behavioral systems is a non-trivial problem, especially when they change during learning. Descriptions of closed loop systems in terms of information theory date back to the 1950s, however, there have been only a few attempts which take into account learning, mostly measuring information of inputs. In this study we analyze a specific type of closed loop system by looking at the input as well as the output space. For this, we investigate simulated agents that perform differential Hebbian learning (STDP). In the first part we show that analytical solutions can be found for the temporal development of such systems for relatively simple cases. In the second part of this study we try to answer the following question: How can we predict which system from a given class would be the best for a particular scenario? This question is addressed using energy, input/output ratio and entropy measures and investigating their development during learning. This way we can show that within well-specified scenarios there are indeed agents which are optimal with respect to their structure and adaptive properties

    Stabilising Hebbian learning with a third factor in a food retrieval task

    No full text
    When neurons fire together they wire together. This is Donald Hebb's famous postulate. However, Hebbian learning is inherently unstable because synaptic weights will self amplify themselves: the more a synapse is able to drive a postsynaptic cell the more the synaptic weight will grow. We present a new biologically realistic way how to stabilise synaptic weights by introducing a third factor which switches on or off learning so that self amplification is minimised. The third factor can be identified by the activity of dopaminergic neurons in VTA which fire when a reward has been encountered. This leads to a new interpretation of the dopamine signal which goes beyond the classical prediction error hypothesis. The model is tested by a real world task where a robot has to find "food disks" in an environment
    corecore