840 research outputs found
Recommended from our members
A reinforcement learning theory for homeostatic regulation
Reinforcement learning models address animal’s behavioral adaptation to its changing “external” environment, and are based on the assumption that Pavlovian, habitual and goal-directed responses seek to maximize reward acquisition. Negative-feedback models of homeostatic regulation, on the other hand, are concerned with behavioral adaptation in response to the “internal” state of the animal, and assume that animals’ behavioral objective is to minimize deviations of some key physiological variables from their hypothetical setpoints. Building upon the drive-reduction theory of reward, we propose a new analytical framework that integrates learning and regulatory systems, such that the two seemingly unrelated objectives of reward maximization and physiological-stability prove to be identi-
cal. The proposed theory shows behavioral adaptation to both internal and external states in a disciplined way. We further show that the proposed framework allows for a unified explanation of some behavioral pattern like motivational sensitivity of different associative learning mechanism, anticipatory responses, interaction among competing motivational systems, and risk aversion
Optimizing the depth and the direction of prospective planning using information values
Evaluating the future consequences of actions is achievable by simulating a mental search tree into the future. Expanding deep trees, however, is computationally taxing. Therefore, machines and humans use a plan-until-habit scheme that simulates the environment up to a limited depth and then exploits habitual values as proxies for consequences that may arise in the future. Two outstanding questions in this scheme are “in which directions the search tree should be expanded?”, and “when should the expansion stop?”. Here we propose a principled solution to these questions based on a speed/accuracy tradeoff: deeper expansion in the appropriate directions leads to more accurate planning, but at the cost of slower decision-making. Our simulation results show how this algorithm expands the search tree effectively and efficiently in a grid-world environment. We further show that our algorithm can explain several behavioral patterns in animals and humans, namely the effect of time-pressure on the depth of planning, the effect of reward magnitudes on the direction of planning, and the gradual shift from goal-directed to habitual behavior over the course of training. The algorithm also provides several predictions testable in animal/human experiments
Recommended from our members
Cocaine Addiction as a Homeostatic Reinforcement Learning Disorder
Drug addiction implicates both reward learning and homeostatic regulation mechanisms of the brain. This has stimulated 2 partially successful theoretical perspectives on addiction. Many important aspects of addiction, however, remain to be explained within a single, unified framework that integrates the 2 mechanisms. Building upon a recently developed homeostatic reinforcement learning theory, the authors focus on a key transition stage of addiction that is well modeled in animals, escalation of drug use, and propose a computational theory of cocaine addiction where cocaine reinforces behavior due to its rapid homeostatic corrective effect, whereas its chronic use induces slow and long-lasting changes in homeostatic setpoint. Simulations show that our new theory accounts for key behavioral and neurobiological features of addiction, most notably, escalation of cocaine use, drug-primed craving and relapse, individual differences underlying dose-response curves, and dopamine D2-receptor downregulation in addicts. The theory also generates unique predictions about cocaine self-administration behavior in rats that are confirmed by new experimental results. Viewing addiction as a homeostatic reinforcement learning disorder coherently explains many behavioral and neurobiological aspects of the transition to cocaine addiction, and suggests a new perspective toward understanding addiction
Recommended from our members
Midbrain Dopamine Neurons Signal Belief in Choice Accuracy during a Perceptual Decision
Central to the organization of behavior is the ability to predict the values of outcomes to guide choices. The accuracy of such predictions is honed by a teaching signal that indicates how incorrect a prediction was (“reward prediction error,” RPE). In several reinforcement learning contexts, such as Pavlovian conditioning and decisions guided by reward history, this RPE signal is provided by midbrain dopamine neurons. In many situations, however, the stimuli predictive of outcomes are perceptually ambiguous. Perceptual uncertainty is known to influence choices, but it has been unclear whether or how dopamine neurons factor it into their teaching signal. To cope with uncertainty, we extended a reinforcement learning model with a belief state about the perceptually ambiguous stimulus; this model generates an estimate of the probability of choice correctness, termed decision confidence. We show that dopamine responses in monkeys performing a perceptually ambiguous decision task comply with the model’s predictions. Consequently, dopamine responses did not simply reflect a stimulus’ average expected reward value but were predictive of the trial-to-trial fluctuations in perceptual accuracy. These confidence-dependent dopamine responses emerged prior to monkeys’ choice initiation, raising the possibility that dopamine impacts impending decisions, in addition to encoding a post-decision teaching signal. Finally, by manipulating reward size, we found that dopamine neurons reflect both the upcoming reward size and the confidence in achieving it. Together, our results show that dopamine responses convey teaching signals that are also appropriate for perceptual decisions
Drugs, police inefficiencies and gangsterism in violently impoverished communities like Overcome
This research establishes an understanding of the relationship between gangsterism, the drug
commodity and inefficiencies in the state’s policing institution, as well as the consequences
of this relationship, in the context of Overcome squatter area in Cape Town. Overcome is
representative of other violently impoverished Cape Town communities with its high rate of
unemployment, low quality of education, domestic abuse, stagnant housing crisis, lack of
access to intellectual and material resources or opportunities for personal growth,
gangsterism, inefficient policing, substance-dependency, and violence. This research
demonstrates that the current relationship between the gangs, drugs and the police fosters an
unpredictable, violent environment, leaving residents in a constant state of vulnerability.
The argument is developed around three key historical junctures in the development of
organized crime in South Africa, starting with the growth of the mining industry in the
Witwatersrand after 1886, followed by forced removals and prohibition like policies in Cape
Town circa 1970, and finally the upheaval created around transition away from apartheid in
1994.
Research for this paper was both quantitative and qualitative in nature, and included expert
interviews on the subjects of police criminality, narcotic sales, and gangsterism. Newspapers
articles, crime statistics, books, census figures, and a host of journals were also utilized.
Upon reviewing a host of police inefficiencies and criminal collusions, the research
concludes that public criminals related to the state, such as police, and private criminals, such
as gangsters, work together in a multitude of ways in a bid to acquire wealth, most notably
through an illicit drug market today dominated by ‘tik’. It is shown that this violent narcotics
market binds police and gangsters together at the expense of creating a state of insecurity for
those living in poor drug markets
Value Driven Representation for Human-in-the-Loop Reinforcement Learning
Interactive adaptive systems powered by Reinforcement Learning (RL) have many
potential applications, such as intelligent tutoring systems. In such systems
there is typically an external human system designer that is creating,
monitoring and modifying the interactive adaptive system, trying to improve its
performance on the target outcomes. In this paper we focus on algorithmic
foundation of how to help the system designer choose the set of sensors or
features to define the observation space used by reinforcement learning agent.
We present an algorithm, value driven representation (VDR), that can
iteratively and adaptively augment the observation space of a reinforcement
learning agent so that is sufficient to capture a (near) optimal policy. To do
so we introduce a new method to optimistically estimate the value of a policy
using offline simulated Monte Carlo rollouts. We evaluate the performance of
our approach on standard RL benchmarks with simulated humans and demonstrate
significant improvement over prior baselines
Recommended from our members
Homeostatic reinforcement learning for integrating reward collection and physiological stability
Efficient regulation of internal homeostasis and defending it against perturbations requires adaptive behavioral strategies. However, the computational principles mediating the interaction between homeostatic and associative learning processes remain undefined. Here we use a definition of primary rewards, as outcomes fulfilling physiological needs, to build a normative theory showing how learning motivated behaviors may be modulated by internal states. Within this framework, we mathematically prove that seeking rewards is equivalent to the fundamental objective of physiological stability, defining the notion of physiological rationality of behavior. We further suggest a formal basis for temporal discounting of rewards by showing that discounting motivates animals to follow the shortest path in the space of physiological variables toward the desired setpoint. We also explain how animals learn to act predictively to preclude prospective homeostatic challenges, and several other behavioral patterns. Finally, we suggest a computational role for interaction between hypothalamus and the brain reward system
- …