2,229 research outputs found
Mutually Dependent Decision Processes Models
We introduce a new framework for dynamic programming called mutually dependent decision processes (MDDPs). Each MDDPs model is constructed from two or more finite-stage deterministic decision processes. At each stage, the reward in one process depends on the optimal values of the other processes, whose initial state is determined by the current state and decision of the original process. We formulate the MDDPs models and derive their mutually dependent recursive equations by dynamic programming
Impairments in reinforcement learning do not explain enhanced habit formation in cocaine use disorder
Rationale Drug addiction has been suggested to develop through drug-induced changes in learning and memory processes. Whilst the initiation of drug use is typically goal-directed and hedonically motivated, over time, drug-taking may develop into a stimulus-driven habit, characterised by persistent use of the drug irrespective of the consequences. Converging lines of evidence suggest that stimulant drugs facilitate the transition of goal-directed into habitual drug-taking, but their contribution to goal-directed learning is less clear. Computational modelling may provide an elegant means for elucidating changes during instrumental learning that may explain enhanced habit formation. Objectives We used formal reinforcement learning algorithms to deconstruct the process of appetitive instrumental learning and to explore potential associations between goal-directed and habitual actions in patients with cocaine use disorder (CUD). Methods We re-analysed appetitive instrumental learning data in 55 healthy control volunteers and 70 CUD patients by applying a reinforcement learning model within a hierarchical Bayesian framework. We used a regression model to determine the influence of learning parameters and variations in brain structure on subsequent habit formation. Results Poor instrumental learning performance in CUD patients was largely determined by difficulties with learning from feedback, as reflected by a significantly reduced learning rate. Subsequent formation of habitual response patterns was partly explained by group status and individual variation in reinforcement sensitivity. White matter integrity within goal-directed networks was only associated with performance parameters in controls but not in CUD patients. Conclusions Our data indicate that impairments in reinforcement learning are insufficient to account for enhanced habitual responding in CUD
Converging Decision Processes with Multiplicative Reward System
Converging decision process is a decision process model with converging transition, which is one of the nonserial branch systems proposed by Nemhauser. This paper deals with multiplicative reward system on a finite-stage deterministic converging decision process. The purpose of this work is to give a recursive method to solve our model by bidecision approach
Reinforcement Learning
Brains rule the world, and brain-like computation is increasingly used in computers and electronic devices. Brain-like computation is about processing and interpreting data or directly putting forward and performing actions. Learning is a very important aspect. This book is on reinforcement learning which involves performing actions to achieve a goal. The first 11 chapters of this book describe and extend the scope of reinforcement learning. The remaining 11 chapters show that there is already wide usage in numerous fields. Reinforcement learning can tackle control tasks that are too complex for traditional, hand-designed, non-learning controllers. As learning computers can deal with technical complexities, the tasks of human operators remain to specify goals on increasingly higher levels. This book shows that reinforcement learning is a very dynamic area in terms of theory and applications and it shall stimulate and encourage new research in this field
Recommended from our members
Impairments in reinforcement learning do not explain enhanced habit formation in cocaine use disorder
Abstract: Rationale: Drug addiction has been suggested to develop through drug-induced changes in learning and memory processes. Whilst the initiation of drug use is typically goal-directed and hedonically motivated, over time, drug-taking may develop into a stimulus-driven habit, characterised by persistent use of the drug irrespective of the consequences. Converging lines of evidence suggest that stimulant drugs facilitate the transition of goal-directed into habitual drug-taking, but their contribution to goal-directed learning is less clear. Computational modelling may provide an elegant means for elucidating changes during instrumental learning that may explain enhanced habit formation. Objectives: We used formal reinforcement learning algorithms to deconstruct the process of appetitive instrumental learning and to explore potential associations between goal-directed and habitual actions in patients with cocaine use disorder (CUD). Methods: We re-analysed appetitive instrumental learning data in 55 healthy control volunteers and 70 CUD patients by applying a reinforcement learning model within a hierarchical Bayesian framework. We used a regression model to determine the influence of learning parameters and variations in brain structure on subsequent habit formation. Results: Poor instrumental learning performance in CUD patients was largely determined by difficulties with learning from feedback, as reflected by a significantly reduced learning rate. Subsequent formation of habitual response patterns was partly explained by group status and individual variation in reinforcement sensitivity. White matter integrity within goal-directed networks was only associated with performance parameters in controls but not in CUD patients. Conclusions: Our data indicate that impairments in reinforcement learning are insufficient to account for enhanced habitual responding in CUD
Intentions and Creative Insights: a Reinforcement Learning Study of Creative Exploration in Problem-Solving
Insight is perhaps the cognitive phenomenon most closely associated with creativity. People engaged in problem-solving sometimes experience a sudden transformation: they see the problem in a radically different manner, and simultaneously feel with great certainty that they have found the right solution. The change of problem representation is called "restructuring", and the affective changes associated with sudden progress are called the "Aha!" experience. Together, restructuring and the "Aha!" experience characterize insight.
Reinforcement Learning is both a theory of biological learning and a subfield of machine learning. In its psychological and neuroscientific guise, it is used to model habit formation, and, increasingly, executive function. In its artificial intelligence guise, it is currently the favored paradigm for modeling agents interacting with an environment. Reinforcement learning, I argue, can serve as a model of insight: its foundation in learning coincides with the role of experience in insight problem-solving; its use of an explicit "value" provides the basis for the "Aha!" experience; and finally, in a hierarchical form, it can achieve a sudden change of representation resembling restructuring.
An experiment helps confirm some parallels between reinforcement learning and insight. It shows how transfer from prior tasks results in considerably accelerated learning, and how the value function increase resembles the sense of progress corresponding to the "Aha!"-moment. However, a model of insight on the basis of hierarchical reinforcement learning did not display the expected "insightful" behavior.
A second model of insight is presented, in which temporal abstraction is based on self-prediction: by predicting its own future decisions, an agent adjusts its course of action on the basis of unexpected events. This kind of temporal abstraction, I argue, corresponds to what we call "intentions", and offers a promising model for biological insight. It explains the "Aha!" experience as resulting from a temporal difference error, whereas restructuring results from an adjustment of the agent's internal state on the basis of either new information or a stochastic interpretation of stimuli. The model is called the actor-critic-intention (ACI) architecture.
Finally, the relationship between intentions, insight, and creativity is extensively discussed in light of these models: other works in the philosophical and scientific literature are related to, and sometimes illuminated by the ACI architecture
Recommended from our members
Impairments in reinforcement learning do not explain enhanced habit formation in cocaine use disorder.
RATIONALE: Drug addiction has been suggested to develop through drug-induced changes in learning and memory processes. Whilst the initiation of drug use is typically goal-directed and hedonically motivated, over time, drug-taking may develop into a stimulus-driven habit, characterised by persistent use of the drug irrespective of the consequences. Converging lines of evidence suggest that stimulant drugs facilitate the transition of goal-directed into habitual drug-taking, but their contribution to goal-directed learning is less clear. Computational modelling may provide an elegant means for elucidating changes during instrumental learning that may explain enhanced habit formation. OBJECTIVES: We used formal reinforcement learning algorithms to deconstruct the process of appetitive instrumental learning and to explore potential associations between goal-directed and habitual actions in patients with cocaine use disorder (CUD). METHODS: We re-analysed appetitive instrumental learning data in 55 healthy control volunteers and 70 CUD patients by applying a reinforcement learning model within a hierarchical Bayesian framework. We used a regression model to determine the influence of learning parameters and variations in brain structure on subsequent habit formation. RESULTS: Poor instrumental learning performance in CUD patients was largely determined by difficulties with learning from feedback, as reflected by a significantly reduced learning rate. Subsequent formation of habitual response patterns was partly explained by group status and individual variation in reinforcement sensitivity. White matter integrity within goal-directed networks was only associated with performance parameters in controls but not in CUD patients. CONCLUSIONS: Our data indicate that impairments in reinforcement learning are insufficient to account for enhanced habitual responding in CUD.This research was funded by the Medical Research Council (MR/J012084/1) and the NIHR Cambridge Biomedical Research Centre and was conducted at the NIHR Cambridge Clinical Research Facility. The views expressed are those of the author(s) and not necessarily those of the NHS, the NIHR or the Department of Health and Social Care. This research was also supported in part by a Medical Research Council (MRC) Clinical Research Infrastructure award (MR/M009041/1). R.N.C. consults for Campden Instruments and receives royalties from Cambridge Enterprise, Routledge, and Cambridge University Press. RNC’s research is supported by the UK Medical Research Council (MC_PC_17213). T.W.R. discloses consultancy with Cambridge Cognition, Lundbeck, Mundipharma and Unilever; he receives royalties for CANTAB from Cambridge Cognition and editorial honoraria from Springer Verlag and Elsevier. T.V.L., G.S. P.S.J., A.A.M. and K.D.E. declare to have no potential conflict of interest
- …