189 research outputs found
Anterior Prefrontal Cortex Contributes to Action Selection through Tracking of Recent Reward Trends
The functions of prefrontal cortex remain enigmatic, especially for its anterior sectors, putatively ranging from planning to self-initiated behavior, social cognition, task switching, and memory. A predominant current theory regarding the most anterior sector, the frontopolar cortex (FPC), is that it is involved in exploring alternative courses of action, but the detailed causal mechanisms remain unknown. Here we investigated this issue using the lesion method, together with a novel model-based analysis. Eight patients with anterior prefrontal brain lesions including the FPC performed a four-armed bandit task known from neuroimaging studies to activate the FPC. Model-based analyses of learning demonstrated a selective deficit in the ability to extrapolate the most recent trend, despite an intact general ability to learn from past rewards. Whereas both brain-damaged and healthy controls used comparisons between the two most recent choice outcomes to infer trends that influenced their decision about the next choice, the group with anterior prefrontal lesions showed a complete absence of this component and instead based their choice entirely on the cumulative reward history. Given that the FPC is thought to be the most evolutionarily recent expansion of primate prefrontal cortex, we suggest that its function may reflect uniquely human adaptations to select and update models of reward contingency in dynamic environments
Independent neural computation of value from other people's confidence
Expectation of reward can be shaped by the observation of actions and expressions of other people in one's environment. A person's apparent confidence in the likely reward of an action, for instance, makes qualities of their evidence, not observed directly, socially accessible. This strategy is computationally distinguished from associative learning methods that rely on direct observation, by its use of inference from indirect evidence. In twenty-three healthy human subjects, we isolated effects of first-hand experience, other people's choices, and the mediating effect of their confidence, on decision-making and neural correlates of value within ventromedial prefrontal cortex (vmPFC). Value derived from first hand experience and other people's choices (regardless of confidence) were indiscriminately represented across vmPFC. However, value computed from agent choices weighted by their associated confidence was represented with specificity for ventromedial area 10. This pattern corresponds to shifts of connectivity and overlapping cognitive processes along a posterior-anterior vmPFC axis. Task behavior and self-reported self-reliance for decision-making in other social contexts correlated. The tendency to conform in other social contexts corresponded to increased activation in cortical regions previously shown to respond to social conflict in proportion to subsequent conformity (Campbell-Meiklejohn et al., 2010). The tendency to self-monitor predicted a selectively enhanced response to accordance with others in the right temporoparietal junction (rTPJ). The findings anatomically decompose vmPFC value representations according to computational requirements and provide biological insight into the social transmission of preference and reassurance gained from the confidence of others.
Significance Statement: Decades of research have provided evidence that the ventromedial prefrontal cortex (vmPFC) signals the satisfaction we expect from imminent actions. However, we have a surprisingly modest understanding of the organization of value across this substantial and varied region. This study finds that using cues of the reliability of other peoples'; knowledge to enhance expectation of personal success generates value correlates that are anatomically distinct from those concurrently computed from direct, personal experience. This suggests that representation of decision values in vmPFC is suborganized according to the underlying computation, consistent with what we know about the anatomical heterogeneity of the region. These results also provide insight into the observational learning process by which someone else's confidence can sway and reassure our choices
Recommended from our members
Boredom, Information-Seeking and Exploration
Any adaptive organism faces the choice between taking
actions with known benefits (exploitation), and sampling new
actions to check for other, more valuable opportunities
available (exploration). The latter involves information-
seeking, a drive so fundamental to learning and long-term
reward that it can reasonably be considered, through evolution
or development, to have acquired its own value, independent
of immediate reward. Similarly, behaviors that fail to yield
information may have come to be associated with aversive
experiences such as boredom, demotivation, and task
disengagement. In accord with these suppositions, we propose
that boredom reflects an adaptive signal for managing the
exploration-exploitation tradeoff, in the service of optimizing
information acquisition and long-term reward. We tested
participants in three experiments, manipulating the
information content in their immediate task environment, and
showed that increased perceptions of boredom arise in
environments in which there is little useful information, and
that higher boredom correlates with higher exploration. These
findings are the first step toward a model formalizing the
relationship between exploration, exploitation and boredom
Model based control can give rise to devaluation insensitive choice
Influential recent work aims to ground psychiatric dysfunction in the brain's basic computational mechanisms. For instance, the compulsive symptoms that feature prominently in drug abuse and addiction have been argued to arise from over reliance on a habitual “model-free” system in contrast to a more laborious “model-based” system. Support for this account comes in part from failures to appropriately change behavior in light of new events. Notably, instrumental responding can, in some circumstances, persist despite reinforcer devaluation, perhaps reflecting control by model-free mechanisms that are driven by past reinforcement rather than knowledge of the (now devalued) outcome. However, another line of theory posits a different mechanism – latent causal inference – that can modulate behavioral change. It concerns how animals identify different contingencies that apply in different circumstances, by covertly clustering experiences into distinct groups. Here we combine both lines of theory to investigate the consequences of latent cause inference on instrumental sensitivity to reinforcer devaluation. We show that instrumental insensitivity to reinforcer devaluation can arise in this theory even using only model-based planning, and does not require or imply any habitual, model-free component. These ersatz habits (like laboratory ones) emerge after overtraining, interact with contextual cues, and show preserved sensitivity to reinforcer devaluation on a separate consumption test, a standard control. Together, this work highlights the need for caution in using reinforcer devaluation procedures to rule in (or out) the contribution of different learning mechanisms and offers a new perspective on the neurocomputational substrates of drug abuse
Model-based learning protects against forming habits.
Studies in humans and rodents have suggested that behavior can at times be "goal-directed"-that is, planned, and purposeful-and at times "habitual"-that is, inflexible and automatically evoked by stimuli. This distinction is central to conceptions of pathological compulsion, as in drug abuse and obsessive-compulsive disorder. Evidence for the distinction has primarily come from outcome devaluation studies, in which the sensitivity of a previously learned behavior to motivational change is used to assay the dominance of habits versus goal-directed actions. However, little is known about how habits and goal-directed control arise. Specifically, in the present study we sought to reveal the trial-by-trial dynamics of instrumental learning that would promote, and protect against, developing habits. In two complementary experiments with independent samples, participants completed a sequential decision task that dissociated two computational-learning mechanisms, model-based and model-free. We then tested for habits by devaluing one of the rewards that had reinforced behavior. In each case, we found that individual differences in model-based learning predicted the participants' subsequent sensitivity to outcome devaluation, suggesting that an associative mechanism underlies a bias toward habit formation in healthy individuals.This work was funded by a Sir Henry Wellcome Postdoctoral
Fellowship (101521/Z/12/Z) awarded to C.M.G. ND is supported
by a Scholar Award from the McDonnell FoundationThe authors report
no conflicts of interest and declare no competing financial interests.This is the final published version. It first appeared at http://link.springer.com/article/10.3758%2Fs13415-015-0347-6
Recommended from our members
Characterizing a psychiatric symptom dimension related to deficits in goal-directed control.
Prominent theories suggest that compulsive behaviors, characteristic of obsessive-compulsive disorder and addiction, are driven by shared deficits in goal-directed control, which confers vulnerability for developing rigid habits. However, recent studies have shown that deficient goal-directed control accompanies several disorders, including those without an obvious compulsive element. Reasoning that this lack of clinical specificity might reflect broader issues with psychiatric diagnostic categories, we investigated whether a dimensional approach would better delineate the clinical manifestations of goal-directed deficits. Using large-scale online assessment of psychiatric symptoms and neurocognitive performance in two independent general-population samples, we found that deficits in goal-directed control were most strongly associated with a symptom dimension comprising compulsive behavior and intrusive thought. This association was highly specific when compared to other non-compulsive aspects of psychopathology. These data showcase a powerful new methodology and highlight the potential of a dimensional, biologically-grounded approach to psychiatry research.Funded by a Sir Henry Wellcome Postdoctoral Fellowship (101521/Z/12/Z) awarded to CM Gillan.
Claire M Gillan: Wellcome Trust 101521/Z/12/Z
Nathaniel D Daw: National Institute on Drug Abuse 1R01DA038891
Nathaniel D Daw: James S. McDonnell Foundation Scholar AwardThis is the final version of the article. It first appeared from eLife Sciences Publications via http://dx.doi.org/10.7554/eLife.1130
Disentangling Abstraction from Statistical Pattern Matching in Human and Machine Learning
The ability to acquire abstract knowledge is a hallmark of human intelligence
and is believed by many to be one of the core differences between humans and
neural network models. Agents can be endowed with an inductive bias towards
abstraction through meta-learning, where they are trained on a distribution of
tasks that share some abstract structure that can be learned and applied.
However, because neural networks are hard to interpret, it can be difficult to
tell whether agents have learned the underlying abstraction, or alternatively
statistical patterns that are characteristic of that abstraction. In this work,
we compare the performance of humans and agents in a meta-reinforcement
learning paradigm in which tasks are generated from abstract rules. We define a
novel methodology for building "task metamers" that closely match the
statistics of the abstract tasks but use a different underlying generative
process, and evaluate performance on both abstract and metamer tasks. In our
first set of experiments, we found that humans perform better at abstract tasks
than metamer tasks whereas a widely-used meta-reinforcement learning agent
performs worse on the abstract tasks than the matched metamers. In a second set
of experiments, we base the tasks on abstractions derived directly from
empirically identified human priors. We utilize the same procedure to generate
corresponding metamer tasks, and see the same double dissociation between
humans and agents. This work provides a foundation for characterizing
differences between humans and machine learning that can be used in future work
towards developing machines with human-like behavior
Tonic Dopamine Modulates Exploitation of Reward Learning
The impact of dopamine on adaptive behavior in a naturalistic environment is largely unexamined. Experimental work suggests that phasic dopamine is central to reinforcement learning whereas tonic dopamine may modulate performance without altering learning per se; however, this idea has not been developed formally or integrated with computational models of dopamine function. We quantitatively evaluate the role of tonic dopamine in these functions by studying the behavior of hyperdopaminergic DAT knockdown mice in an instrumental task in a semi-naturalistic homecage environment. In this “closed economy” paradigm, subjects earn all of their food by pressing either of two levers, but the relative cost for food on each lever shifts frequently. Compared to wild-type mice, hyperdopaminergic mice allocate more lever presses on high-cost levers, thus working harder to earn a given amount of food and maintain their body weight. However, both groups show a similarly quick reaction to shifts in lever cost, suggesting that the hyperdominergic mice are not slower at detecting changes, as with a learning deficit. We fit the lever choice data using reinforcement learning models to assess the distinction between acquisition and expression the models formalize. In these analyses, hyperdopaminergic mice displayed normal learning from recent reward history but diminished capacity to exploit this learning: a reduced coupling between choice and reward history. These data suggest that dopamine modulates the degree to which prior learning biases action selection and consequently alters the expression of learned, motivated behavior
Humans decompose tasks by trading off utility and computational cost
Human behavior emerges from planning over elaborate decompositions of tasks
into goals, subgoals, and low-level actions. How are these decompositions
created and used? Here, we propose and evaluate a normative framework for task
decomposition based on the simple idea that people decompose tasks to reduce
the overall cost of planning while maintaining task performance. Analyzing
11,117 distinct graph-structured planning tasks, we find that our framework
justifies several existing heuristics for task decomposition and makes
predictions that can be distinguished from two alternative normative accounts.
We report a behavioral study of task decomposition () that uses 30
randomly sampled graphs, a larger and more diverse set than that of any
previous behavioral study on this topic. We find that human responses are more
consistent with our framework for task decomposition than alternative normative
accounts and are most consistent with a heuristic -- betweenness centrality --
that is justified by our approach. Taken together, our results provide new
theoretical insight into the computational principles underlying the
intelligent structuring of goal-directed behavior
- …