Search CORE

840 research outputs found

Recommended from our members

A reinforcement learning theory for homeostatic regulation

Author: Gutkin B. S.
Keramati M.
Publication venue: Neural Information Processing Systems ( NIPS )
Publication date: 01/01/2011
Field of study

Reinforcement learning models address animal’s behavioral adaptation to its changing “external” environment, and are based on the assumption that Pavlovian, habitual and goal-directed responses seek to maximize reward acquisition. Negative-feedback models of homeostatic regulation, on the other hand, are concerned with behavioral adaptation in response to the “internal” state of the animal, and assume that animals’ behavioral objective is to minimize deviations of some key physiological variables from their hypothetical setpoints. Building upon the drive-reduction theory of reward, we propose a new analytical framework that integrates learning and regulatory systems, such that the two seemingly unrelated objectives of reward maximization and physiological-stability prove to be identi- cal. The proposed theory shows behavioral adaptation to both internal and external states in a disciplined way. We further show that the proposed framework allows for a unified explanation of some behavioral pattern like motivational sensitivity of different associative learning mechanism, anticipatory responses, interaction among competing motivational systems, and risk aversion

City Research Online

Optimizing the depth and the direction of prospective planning using information values

Author: Dezfouli A.
Keramati M.
Sezener C. E.
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2019
Field of study

Evaluating the future consequences of actions is achievable by simulating a mental search tree into the future. Expanding deep trees, however, is computationally taxing. Therefore, machines and humans use a plan-until-habit scheme that simulates the environment up to a limited depth and then exploits habitual values as proxies for consequences that may arise in the future. Two outstanding questions in this scheme are “in which directions the search tree should be expanded?”, and “when should the expansion stop?”. Here we propose a principled solution to these questions based on a speed/accuracy tradeoff: deeper expansion in the appropriate directions leads to more accurate planning, but at the cost of slower decision-making. Our simulation results show how this algorithm expands the search tree effectively and efficiently in a grid-world environment. We further show that our algorithm can explain several behavioral patterns in animals and humans, namely the effect of time-pressure on the depth of planning, the effect of reward magnitudes on the direction of planning, and the gradual shift from goal-directed to habitual behavior over the course of training. The algorithm also provides several predictions testable in animal/human experiments

DepositOnce

City Research Online

Directory of Open Access Journals

MPG.PuRe

FigShare

Recommended from our members

Cocaine Addiction as a Homeostatic Reinforcement Learning Disorder

Author: Durand A.
Girardeau P.
Gutkin B.
Keramati M.
Serge A.
Publication venue: 'American Psychological Association (APA)'
Publication date: 01/01/2017
Field of study

Drug addiction implicates both reward learning and homeostatic regulation mechanisms of the brain. This has stimulated 2 partially successful theoretical perspectives on addiction. Many important aspects of addiction, however, remain to be explained within a single, unified framework that integrates the 2 mechanisms. Building upon a recently developed homeostatic reinforcement learning theory, the authors focus on a key transition stage of addiction that is well modeled in animals, escalation of drug use, and propose a computational theory of cocaine addiction where cocaine reinforces behavior due to its rapid homeostatic corrective effect, whereas its chronic use induces slow and long-lasting changes in homeostatic setpoint. Simulations show that our new theory accounts for key behavioral and neurobiological features of addiction, most notably, escalation of cocaine use, drug-primed craving and relapse, individual differences underlying dose-response curves, and dopamine D2-receptor downregulation in addicts. The theory also generates unique predictions about cocaine self-administration behavior in rats that are confirmed by new experimental results. Viewing addiction as a homeostatic reinforcement learning disorder coherently explains many behavioral and neurobiological aspects of the transition to cocaine addiction, and suggests a new perspective toward understanding addiction

City Research Online

Crossref

MPG.PuRe

بررسی سطح سلامت اجتماعی دانشجویان پزشکی دانشگاه علوم پزشکی کرمان در سال ۱۳۹۵

Author: Keramati Seiede Maryam
Publication venue
Publication date: 24/09/2017
Field of study

Simorgh Research Repository

Recommended from our members

Midbrain Dopamine Neurons Signal Belief in Choice Accuracy during a Perceptual Decision

Author: Kepec A.
Keramati M.
Lak A.
Nomoto K.
Sakagami M.
Publication venue: 'Elsevier BV'
Publication date: 01/01/2017
Field of study

Central to the organization of behavior is the ability to predict the values of outcomes to guide choices. The accuracy of such predictions is honed by a teaching signal that indicates how incorrect a prediction was (“reward prediction error,” RPE). In several reinforcement learning contexts, such as Pavlovian conditioning and decisions guided by reward history, this RPE signal is provided by midbrain dopamine neurons. In many situations, however, the stimuli predictive of outcomes are perceptually ambiguous. Perceptual uncertainty is known to influence choices, but it has been unclear whether or how dopamine neurons factor it into their teaching signal. To cope with uncertainty, we extended a reinforcement learning model with a belief state about the perceptually ambiguous stimulus; this model generates an estimate of the probability of choice correctness, termed decision confidence. We show that dopamine responses in monkeys performing a perceptually ambiguous decision task comply with the model’s predictions. Consequently, dopamine responses did not simply reflect a stimulus’ average expected reward value but were predictive of the trial-to-trial fluctuations in perceptual accuracy. These confidence-dependent dopamine responses emerged prior to monkeys’ choice initiation, raising the possibility that dopamine impacts impending decisions, in addition to encoding a post-decision teaching signal. Finally, by manipulating reward size, we found that dopamine neurons reflect both the upcoming reward size and the confidence in achieving it. Together, our results show that dopamine responses convey teaching signals that are also appropriate for perceptual decisions

City Research Online

Cold Spring Harbor Laboratory Institutional Repository

UCL Discovery

MPG.PuRe

Drugs, police inefficiencies and gangsterism in violently impoverished communities like Overcome

Author: Taheri-Keramati Yashar
Publication venue: Department of Political Studies
Publication date: 01/01/2013
Field of study

This research establishes an understanding of the relationship between gangsterism, the drug commodity and inefficiencies in the state’s policing institution, as well as the consequences of this relationship, in the context of Overcome squatter area in Cape Town. Overcome is representative of other violently impoverished Cape Town communities with its high rate of unemployment, low quality of education, domestic abuse, stagnant housing crisis, lack of access to intellectual and material resources or opportunities for personal growth, gangsterism, inefficient policing, substance-dependency, and violence. This research demonstrates that the current relationship between the gangs, drugs and the police fosters an unpredictable, violent environment, leaving residents in a constant state of vulnerability. The argument is developed around three key historical junctures in the development of organized crime in South Africa, starting with the growth of the mining industry in the Witwatersrand after 1886, followed by forced removals and prohibition like policies in Cape Town circa 1970, and finally the upheaval created around transition away from apartheid in 1994. Research for this paper was both quantitative and qualitative in nature, and included expert interviews on the subjects of police criminality, narcotic sales, and gangsterism. Newspapers articles, crime statistics, books, census figures, and a host of journals were also utilized. Upon reviewing a host of police inefficiencies and criminal collusions, the research concludes that public criminals related to the state, such as police, and private criminals, such as gangsters, work together in a multitude of ways in a bid to acquire wealth, most notably through an illicit drug market today dominated by ‘tik’. It is shown that this violent narcotics market binds police and gangsters together at the expense of creating a state of insecurity for those living in poor drug markets

Cape Town University OpenUCT

Value Driven Representation for Human-in-the-Loop Reinforcement Learning

Author: Brunskill Emma
Keramati Ramtin
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 02/04/2020
Field of study

Interactive adaptive systems powered by Reinforcement Learning (RL) have many potential applications, such as intelligent tutoring systems. In such systems there is typically an external human system designer that is creating, monitoring and modifying the interactive adaptive system, trying to improve its performance on the target outcomes. In this paper we focus on algorithmic foundation of how to help the system designer choose the set of sensors or features to define the observation space used by reinforcement learning agent. We present an algorithm, value driven representation (VDR), that can iteratively and adaptively augment the observation space of a reinforcement learning agent so that is sufficient to capture a (near) optimal policy. To do so we introduce a new method to optimistically estimate the value of a policy using offline simulated Monte Carlo rollouts. We evaluate the performance of our approach on standard RL benchmarks with simulated humans and demonstrate significant improvement over prior baselines

arXiv.org e-Print Archive

Crossref

Recommended from our members

Homeostatic reinforcement learning for integrating reward collection and physiological stability

Author: Gutkin B.
Keramati M.
Publication venue: 'eLife Sciences Publications, Ltd'
Publication date: 23/10/2014
Field of study

Efficient regulation of internal homeostasis and defending it against perturbations requires adaptive behavioral strategies. However, the computational principles mediating the interaction between homeostatic and associative learning processes remain undefined. Here we use a definition of primary rewards, as outcomes fulfilling physiological needs, to build a normative theory showing how learning motivated behaviors may be modulated by internal states. Within this framework, we mathematically prove that seeking rewards is equivalent to the fundamental objective of physiological stability, defining the notion of physiological rationality of behavior. We further suggest a formal basis for temporal discounting of rewards by showing that discounting motivates animals to follow the shortest path in the space of physiological variables toward the desired setpoint. We also explain how animals learn to act predictively to preclude prospective homeostatic challenges, and several other behavioral patterns. Finally, we suggest a computational role for interaction between hypothalamus and the brain reward system

City Research Online

Crossref

PubMed Central