Search CORE

11 research outputs found

A neural network model for the orbitofrontal cortex and task space acquisition during reinforcement learning

Author: Chechang Nie (4734576)
Tianming Yang (1909876)
Zhenbo Cheng (4734567)
Zhewei Zhang (4734573)
Zhongqiao Lin (4734570)
Publication venue
Publication date: 04/01/2018
Field of study

<div>Reinforcement learning has been widely used in explaining animal behavior. In reinforcement learning, the agent learns the value of the states in the task, collectively constituting the task state space, and uses the knowledge to choose actions and acquire desired outcomes. It has been proposed that the orbitofrontal cortex (OFC) encodes the task state space during reinforcement learning. However, it is not well understood how the OFC acquires and stores task state information. Here, we propose a neural network model based on reservoir computing. Reservoir networks exhibit heterogeneous and dynamic activity patterns that are suitable to encode task states. The information can be extracted by a linear readout trained with reinforcement learning. We demonstrate how the network acquires and stores task structures. The network exhibits reinforcement learning behavior and its aspects resemble experimental findings of the OFC. Our study provides a theoretical explanation of how the OFC may contribute to reinforcement learning and a new approach to understanding the neural mechanism underlying reinforcement learning.</div

Directory of Open Access Journals

FigShare

Network analyses for the Two-stage Markov decision task.

Author: Chechang Nie (4734576)
Tianming Yang (1909876)
Zhenbo Cheng (4734567)
Zhewei Zhang (4734573)
Zhongqiao Lin (4734570)
Publication venue
Publication date
Field of study

A. Factorial analysis of choice behavior. The network is more likely to repeat the choice under the conditions common-rewarded (CR) and rare-unrewarded (RN) than under the conditions common-unrewarded (CU) and rare-rewarded (RR). B. The task structure index keeps growing in the intact network (blue line), but stays at a low level when the reward input is missing (red line). Stars indicate significant difference (One-way ANOVA, p<0.05). C. Fitting the behavioral performance with a mixture of task-agnostic and task-aware algorithms. The weight parameter w for learning with the knowledge of the task structure is significantly larger for the intact network (blue data points) than the network without the reward input (red data points). Each data point represents a simulation run. A one-way ANOVA is used to determine the significance (p<0.05). D. PCA on the network population activity. The network states are plotted in the space spanned by the first 3 PCA components. The network can distinguish all 8 different states. E. The weight differences between the connections between SEL neurons and the DML unit A1 and DML unit A2. The gray and white areas indicate the blocks in which intermediate outcome B1 is more likely to lead to a reward and the blocks in which B2 is more likely to lead to a reward, respectively. F. Logistic regression shows that only the last trial’s state affect the choice. The regression includes four different states (intermediate outcome x reward outcome) for each trial up to 10 trials before the current trials. Error bars show s.e.m. across simulation runs. G. Logistic regression reveals that only the combination of the intermediate states and the reward outcome in the last trial affects the decision. The factors being evaluated are: Correct—a tendency to choose the better choice in current block; Reward—a tendency to repeat the previous choice if it is rewarded; Stay—a tendency to repeat the previous choice; Transition—a tendency to repeat the same choice following common intermediate outcomes and switch the choice following rare intermediate outcomes; Trans x Out–a tendency to repeat the same choice if a common intermediate outcome is rewarded or a rare intermediate outcome unrewarded, and to switch the choice if a common intermediate outcome is unrewarded or a rare intermediate outcome rewarded.</p

FigShare

Two-stage Markov decision task.

Author: Chechang Nie (4734576)
Tianming Yang (1909876)
Zhenbo Cheng (4734567)
Zhewei Zhang (4734573)
Zhongqiao Lin (4734570)
Publication venue
Publication date
Field of study

A. Task structure of the two-stage Markov decision task. Two options A1 and A2 are available, they lead to two intermediate outcomes B1 and B2 at different probabilities. The width of the arrows indicates the transition probability. Intermediate outcomes B1 and B2 lead to rewards at different probability, and the reward contingency of the intermediate outcomes is reversed between blocks. B. The schematic diagram of the model. It is similar to the model in <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1005925#pcbi.1005925.g001" target="_blank">Fig 1A</a>. The only difference is that there are more input units. C. The event sequence. Units in the input layer are activated sequentially. In the example trial, option A1 is chosen, B1 is presented, and a reward is obtained.</p

FigShare

Value-based decision-making task.

Author: Chechang Nie (4734576)
Tianming Yang (1909876)
Zhenbo Cheng (4734567)
Zhewei Zhang (4734573)
Zhongqiao Lin (4734570)
Publication venue
Publication date
Field of study

A. The schematic diagram of the model. B. The event sequence. The stimuli are presented between 300 ms and 1300 ms after the trial onset. The decision is computed with the neural activity at 1400 ms after the trial onset. The input neurons’ activity profiles mimic those of real neurons (see <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1005925#sec013" target="_blank">Methods</a>). C. Choice pattern. The relative value preference calculated based on the network behavior is indicated on the top left, and the actual relative value preference used in the simulation is 1A = 2B.</p

FigShare

Network analyses for the reversal learning task.

Author: Chechang Nie (4734576)
Tianming Yang (1909876)
Zhenbo Cheng (4734567)
Zhewei Zhang (4734573)
Zhongqiao Lin (4734570)
Publication venue
Publication date
Field of study

A. Selectivity of three example neurons in the reservoir network. Input units are set to 1 from 200ms to 700ms. Left panel: an example neuron that encodes choice options; middle panel: an example neuron that encodes reward outcomes; right panel: an example neuron with mixed selectivity. B. PCA on the network population activity. The network states are plotted in the space spanned by the first 3 PCA components. The activities in different conditions are differentiated after the cue onset. C. The difference between the SEL neurons’ connection weights to DML unit A and DML unit B. The SEL neurons are grouped according to their selectivities. For example, AR represents the group of neurons that respond most strongly when the input units A and R are both activated. The gray and white area indicates the blocks in which the option A and the option B leads to the reward, respectively. D. Left. The proportion of the blocks in which the network does not reach the performance criterion within a block after we remove 50 neurons that are random chosen (control), A selective, or AR selective. Right. The number of errors that the network makes before reaching the criterion with the same 3 types of inactivation. Only the data from the A-rewarding blocks are analyzed. The error bars are s.e.m. based on 10 simulation runs. A one-way ANOVA is used to determine the significance (p<0.05). E. The number of errors needed to reach the performance criterion is maintained after the training stops at the 50th reversal. The error bars are s.e.m. calculated based on 10 simulation runs.</p

FigShare

Value selectivity of the network neurons.

Author: Chechang Nie (4734576)
Tianming Yang (1909876)
Zhenbo Cheng (4734567)
Zhewei Zhang (4734573)
Zhongqiao Lin (4734570)
Publication venue
Publication date
Field of study

A. Three example neurons in the SEL. Left panel: a neuron that encodes chosen value; middle panel: a neuron that encodes offer value; right panel: a neuron that encodes chosen juice. B. The proportions of the neurons with different selectivities from a previous experimental study [<a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1005925#pcbi.1005925.ref011" target="_blank">11</a>]. C. The proportions of the neurons in the reservoir network with different selectivities.</p

FigShare

Reversal learning task.

Author: Chechang Nie (4734576)
Tianming Yang (1909876)
Zhenbo Cheng (4734567)
Zhewei Zhang (4734573)
Zhongqiao Lin (4734570)
Publication venue
Publication date
Field of study

A. The schematic diagram of the model. The network is composed of three parts: input layer (IL), the state encoding layer (SEL) and the decision-making output layer (DML). B. The event sequence. The stimulus and reward inputs are given concurrently at 200 ms after the trial onset and last for 500 ms. After a 200 ms delay, the decision is computed with the neural activity at 900 ms after the trial onset. C. The number of the error trials made before the network achieves the performance threshold. The dark line indicates the performance of the network with the reward input; the light line indicates the performance of the network without the reward input as a model for animals of OFC lesions. Stars indicate significant difference (One-way ANOVA, p<0.05).</p

FigShare

A highly permeable and selective zeolitic imidazolate framework ZIF-95 membrane for H 2/CO 2 separation

Author: Aguado
Aisheng Huang
Bux
Bétard
Caro
d'Alessandro
de Vos
Dunn
Guo
Hermes
Hong
Hu
Huang
Huang
Huang
Huang
Ismail
Jianwen Jiang
Jürgen Caro
Kanezashi
Krishna
Li
Li
Lin
Liu
Lu
Nanyi Wang
Ockwig
Park
Phan
Ranjan
Robeson
Robeson
Rostrup-Nielsen
Shiflett
Uemiya
van den Bergh
van den Bergh
Venna
Wang
Wang
Yifei Chen
Zhongqiao Hu
Publication venue: 'Royal Society of Chemistry (RSC)'
Publication date: 01/01/2012
Field of study

10.1039/c2cc35691kChemical Communications488910981-10983CHCO

Crossref

Institutionelles Repositorium der Leibniz Universität Hannover

ScholarBank@NUS

Efficient and selective removal of congo red by mesoporous amino-modified MIL-101(Cr) nanoadsorbents

Author: Abney
Ali
Allendorf
Bagheri
Bazer-Bachi
Bibi
Chatterjee
Chughtai
Dias
Freundlich
Férey
Hao Meng
Haque
Haque
Haque
Hasan
Hu
Ip
Jin
Junbiao Wu
Junli Xu
Langmuir
Li
Lin
Lowry
Luo
Masoomi
Panchenko
Paredes
Qi
Tambat
Tian
Tong
Tran
Uemura
Wang
Wang
Wang
Wang
Wei
Wu
Xia Zhang
Xu
Yaghi
Yan Xu
Yang
Yide Han
Yılmaz
Yuanming Tan
Zhang
Zhang
Zhao
Zhongqiao Sun
Publication venue: 'Elsevier BV'
Publication date
Field of study

Crossref