49 research outputs found

    Temporal-Difference Reinforcement Learning with Distributed Representations

    Get PDF
    Temporal-difference (TD) algorithms have been proposed as models of reinforcement learning (RL). We examine two issues of distributed representation in these TD algorithms: distributed representations of belief and distributed discounting factors. Distributed representation of belief allows the believed state of the world to distribute across sets of equivalent states. Distributed exponential discounting factors produce hyperbolic discounting in the behavior of the agent itself. We examine these issues in the context of a TD RL model in which state-belief is distributed over a set of exponentially-discounting “micro-Agents”, each of which has a separate discounting factor (γ). Each µAgent maintains an independent hypothesis about the state of the world, and a separate value-estimate of taking actions within that hypothesized state. The overall agent thus instantiates a flexible representation of an evolving world-state. As with other TD models, the value-error (δ) signal within the model matches dopamine signals recorded from animals in standard conditioning reward-paradigms. The distributed representation of belief provides an explanation for the decrease in dopamine at the conditioned stimulus seen in overtrained animals, for the differences between trace and delay conditioning, and for transient bursts of dopamine seen at movement initiation. Because each µAgent also includes its own exponential discounting factor, the overall agent shows hyperbolic discounting, consistent with behavioral experiments

    Nonspecific synaptic plasticity improves the recognition of sparse patterns degraded by local noise

    Get PDF
    Safaryan, K. et al. Nonspecific synaptic plasticity improves the recognition of sparse patterns degraded by local noise. Sci. Rep. 7, 46550; doi: 10.1038/srep46550 (2017). This work is licensed under a Creative Commons Attribution 4.0 International License. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in the credit line; if the material is not included under the Creative Commons license, users will need to obtain permission from the license holder to reproduce the material. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/ © The Author(s) 2017.Many forms of synaptic plasticity require the local production of volatile or rapidly diffusing substances such as nitric oxide. The nonspecific plasticity these neuromodulators may induce at neighboring non-active synapses is thought to be detrimental for the specificity of memory storage. We show here that memory retrieval may benefit from this non-specific plasticity when the applied sparse binary input patterns are degraded by local noise. Simulations of a biophysically realistic model of a cerebellar Purkinje cell in a pattern recognition task show that, in the absence of noise, leakage of plasticity to adjacent synapses degrades the recognition of sparse static patterns. However, above a local noise level of 20 %, the model with nonspecific plasticity outperforms the standard, specific model. The gain in performance is greatest when the spatial distribution of noise in the input matches the range of diffusion-induced plasticity. Hence non-specific plasticity may offer a benefit in noisy environments or when the pressure to generalize is strong.Peer reviewe

    Effects of a robot-assisted training of grasp and pronation/supination in chronic stroke: a pilot study

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Rehabilitation of hand function is challenging, and only few studies have investigated robot-assisted rehabilitation focusing on distal joints of the upper limb. This paper investigates the feasibility of using the <it>HapticKnob</it>, a table-top end-effector device, for robot-assisted rehabilitation of grasping and forearm pronation/supination, two important functions for activities of daily living involving the hand, and which are often impaired in chronic stroke patients. It evaluates the effectiveness of this device for improving hand function and the transfer of improvement to arm function.</p> <p>Methods</p> <p>A single group of fifteen chronic stroke patients with impaired arm and hand functions (Fugl-Meyer motor assessment scale (FM) 10-45/66) participated in a 6-week 3-hours/week rehabilitation program with the <it>HapticKnob</it>. Outcome measures consisted primarily of the FM and Motricity Index (MI) and their respective subsections related to distal and proximal arm function, and were assessed at the beginning, end of treatment and in a 6-weeks follow-up.</p> <p>Results</p> <p>Thirteen subjects successfully completed robot-assisted therapy, with significantly improved hand and arm motor functions, demonstrated by an average 3.00 points increase on the FM and 4.55 on the MI at the completion of the therapy (4.85 FM and 6.84 MI six weeks post-therapy). Improvements were observed both in distal and proximal components of the clinical scales at the completion of the study (2.00 FM wrist/hand, 2.55 FM shoulder/elbow, 2.23 MI hand and 4.23 MI shoulder/elbow). In addition, improvements in hand function were observed, as measured by the Motor Assessment Scale, grip force, and a decrease in arm muscle spasticity. These results were confirmed by motion data collected by the robot.</p> <p>Conclusions</p> <p>The results of this study show the feasibility of this robot-assisted therapy with patients presenting a large range of impairment levels. A significant homogeneous improvement in both hand and arm function was observed, which was maintained 6 weeks after end of the therapy.</p

    Interaction between Purkinje Cells and Inhibitory Interneurons May Create Adjustable Output Waveforms to Generate Timed Cerebellar Output

    Get PDF
    We develop a new model that explains how the cerebellum may generate the timing in classical delay eyeblink conditioning. Recent studies show that both Purkinje cells (PCs) and inhibitory interneurons (INs) have parallel signal processing streams with two time scales: an AMPA receptor-mediated fast process and a metabotropic glutamate receptor (mGluR)-mediated slow process. Moreover, one consistent finding is an increased excitability of PC dendrites (in Larsell's lobule HVI) in animals when they acquire the classical delay eyeblink conditioning naturally, in contrast to in vitro studies, where learning involves long-term depression (LTD). Our model proposes that the delayed response comes from the slow dynamics of mGluR-mediated IP3 activation, and the ensuing calcium concentration change, and not from LTP/LTD. The conditioned stimulus (tone), arriving on the parallel fibers, triggers this slow activation in INs and PC spines. These excitatory (from PC spines) and inhibitory (from INs) signals then interact at the PC dendrites to generate variable waveforms of PC activation. When the unconditioned stimulus (puff), arriving on the climbing fibers, is coupled frequently with this slow activation the waveform is amplified (due to an increased excitability) and leads to a timed pause in the PC population. The disinhibition of deep cerebellar nuclei by this timed pause causes the delayed conditioned response. This suggested PC-IN interaction emphasizes a richer role of the INs in learning and also conforms to the recent evidence that mGluR in the cerebellar cortex may participate in slow motor execution. We show that the suggested mechanism can endow the cerebellar cortex with the versatility to learn almost any temporal pattern, in addition to those that arise in classical conditioning

    Structural Analysis of a Peptide Fragment of Transmembrane Transporter Protein Bilitranslocase

    Get PDF
    Using a combination of genomic and post-genomic approaches is rapidly altering the number of identified human influx carriers. A transmembrane protein bilitranslocase (TCDB 2.A.65) has long attracted attention because of its function as an organic anion carrier. It has also been identified as a potential membrane transporter for cellular uptake of several drugs and due to its implication in drug uptake, it is extremely important to advance the knowledge about its structure. However, at present, only the primary structure of bilitranslocase is known. In our work, transmembrane subunits of bilitranslocase were predicted by a previously developed chemometrics model and the stability of these polypeptide chains were studied by molecular dynamics (MD) simulation. Furthermore, sodium dodecyl sulfate (SDS) micelles were used as a model of cell membrane and herein we present a high-resolution 3D structure of an 18 amino acid residues long peptide corresponding to the third transmembrane part of bilitranslocase obtained by use of multidimensional NMR spectroscopy. It has been experimentally confirmed that one of the transmembrane segments of bilitranslocase has alpha helical structure with hydrophilic amino acid residues oriented towards one side, thus capable of forming a channel in the membrane

    Modeling the Violation of Reward Maximization and Invariance in Reinforcement Schedules

    Get PDF
    It is often assumed that animals and people adjust their behavior to maximize reward acquisition. In visually cued reinforcement schedules, monkeys make errors in trials that are not immediately rewarded, despite having to repeat error trials. Here we show that error rates are typically smaller in trials equally distant from reward but belonging to longer schedules (referred to as “schedule length effect”). This violates the principles of reward maximization and invariance and cannot be predicted by the standard methods of Reinforcement Learning, such as the method of temporal differences. We develop a heuristic model that accounts for all of the properties of the behavior in the reinforcement schedule task but whose predictions are not different from those of the standard temporal difference model in choice tasks. In the modification of temporal difference learning introduced here, the effect of schedule length emerges spontaneously from the sensitivity to the immediately preceding trial. We also introduce a policy for general Markov Decision Processes, where the decision made at each node is conditioned on the motivation to perform an instrumental action, and show that the application of our model to the reinforcement schedule task and the choice task are special cases of this general theoretical framework. Within this framework, Reinforcement Learning can approach contextual learning with the mixture of empirical findings and principled assumptions that seem to coexist in the best descriptions of animal behavior. As examples, we discuss two phenomena observed in humans that often derive from the violation of the principle of invariance: “framing,” wherein equivalent options are treated differently depending on the context in which they are presented, and the “sunk cost” effect, the greater tendency to continue an endeavor once an investment in money, effort, or time has been made. The schedule length effect might be a manifestation of these phenomena in monkeys

    The degree of segmental aneuploidy measured by total copy number abnormalities predicts survival and recurrence in superficial gastroesophageal adenocarcinoma

    Get PDF
    Background: Prognostic biomarkers are needed for superficial gastroesophageal adenocarcinoma (EAC) to predict clinical outcomes and select therapy. Although recurrent mutations have been characterized in EAC, little is known about their clinical and prognostic significance. Aneuploidy is predictive of clinical outcome in many malignancies but has not been evaluated in superficial EAC. Methods: We quantified copy number changes in 41 superficial EAC using Affymetrix SNP 6.0 arrays. We identified recurrent chromosomal gains and losses and calculated the total copy number abnormality (CNA) count for each tumor as a measure of aneuploidy. We correlated CNA count with overall survival and time to first recurrence in univariate and multivariate analyses. Results: Recurrent segmental gains and losses involved multiple genes, including: HER2, EGFR, MET, CDK6, KRAS (recurrent gains); and FHIT, WWOX, CDKN2A/B, SMAD4, RUNX1 (recurrent losses). There was a 40-fold variation in CNA count across all cases. Tumors with the lowest and highest quartile CNA count had significantly better overall survival (p = 0.032) and time to first recurrence (p = 0.010) compared to those with intermediate CNA counts. These associations persisted when controlling for other prognostic variables. Significance: SNP arrays facilitate the assessment of recurrent chromosomal gain and loss and allow high resolution, quantitative assessment of segmental aneuploidy (total CNA count). The non-monotonic association of segmental aneuploidy with survival has been described in other tumors. The degree of aneuploidy is a promising prognostic biomarker in a potentially curable form of EAC. © 2014 Davison et al

    Determinants of synaptic integration and heterogeneity in rebound firing explored with data-driven models of deep cerebellar nucleus cells

    Get PDF
    Significant inroads have been made to understand cerebellar cortical processing but neural coding at the output stage of the cerebellum in the deep cerebellar nuclei (DCN) remains poorly understood. The DCN are unlikely to just present a relay nucleus because Purkinje cell inhibition has to be turned into an excitatory output signal, and DCN neurons exhibit complex intrinsic properties. In particular, DCN neurons exhibit a range of rebound spiking properties following hyperpolarizing current injection, raising the question how this could contribute to signal processing in behaving animals. Computer modeling presents an ideal tool to investigate how intrinsic voltage-gated conductances in DCN neurons could generate the heterogeneous firing behavior observed, and what input conditions could result in rebound responses. To enable such an investigation we built a compartmental DCN neuron model with a full dendritic morphology and appropriate active conductances. We generated a good match of our simulations with DCN current clamp data we recorded in acute slices, including the heterogeneity in the rebound responses. We then examined how inhibitory and excitatory synaptic input interacted with these intrinsic conductances to control DCN firing. We found that the output spiking of the model reflected the ongoing balance of excitatory and inhibitory input rates and that changing the level of inhibition performed an additive operation. Rebound firing following strong Purkinje cell input bursts was also possible, but only if the chloride reversal potential was more negative than −70 mV to allow de-inactivation of rebound currents. Fast rebound bursts due to T-type calcium current and slow rebounds due to persistent sodium current could be differentially regulated by synaptic input, and the pattern of these rebounds was further influenced by HCN current. Our findings suggest that active properties of DCN neurons could play a crucial role for signal processing in the cerebellum

    An Imperfect Dopaminergic Error Signal Can Drive Temporal-Difference Learning

    Get PDF
    An open problem in the field of computational neuroscience is how to link synaptic plasticity to system-level learning. A promising framework in this context is temporal-difference (TD) learning. Experimental evidence that supports the hypothesis that the mammalian brain performs temporal-difference learning includes the resemblance of the phasic activity of the midbrain dopaminergic neurons to the TD error and the discovery that cortico-striatal synaptic plasticity is modulated by dopamine. However, as the phasic dopaminergic signal does not reproduce all the properties of the theoretical TD error, it is unclear whether it is capable of driving behavior adaptation in complex tasks. Here, we present a spiking temporal-difference learning model based on the actor-critic architecture. The model dynamically generates a dopaminergic signal with realistic firing rates and exploits this signal to modulate the plasticity of synapses as a third factor. The predictions of our proposed plasticity dynamics are in good agreement with experimental results with respect to dopamine, pre- and post-synaptic activity. An analytical mapping from the parameters of our proposed plasticity dynamics to those of the classical discrete-time TD algorithm reveals that the biological constraints of the dopaminergic signal entail a modified TD algorithm with self-adapting learning parameters and an adapting offset. We show that the neuronal network is able to learn a task with sparse positive rewards as fast as the corresponding classical discrete-time TD algorithm. However, the performance of the neuronal network is impaired with respect to the traditional algorithm on a task with both positive and negative rewards and breaks down entirely on a task with purely negative rewards. Our model demonstrates that the asymmetry of a realistic dopaminergic signal enables TD learning when learning is driven by positive rewards but not when driven by negative rewards
    corecore