14 research outputs found

    Optimal utility and probability functions for agents with finite computational precision

    No full text
    When making economic choices, such as those between goods or gambles, humans act as if their internal representation of the value and probability of a prospect is distorted away from its true value. These distortions give rise to decisions which apparently fail to maximize reward, and preferences that reverse without reason. Why would humans have evolved to encode value and probability in a distorted fashion, in the face of selective pressure for reward-maximizing choices? Here, we show that under the simple assumption that humans make decisions with finite computational precision––in other words, that decisions are irreducibly corrupted by noise––the distortions of value and probability displayed by humans are approximately optimal in that they maximize reward and minimize uncertainty. In two empirical studies, we manipulate factors that change the reward-maximizing form of distortion, and find that in each case, humans adapt optimally to the manipulation. This work suggests an answer to the longstanding question of why humans make “irrational” economic choices

    Where does value come from?

    No full text
    The computational framework of reinforcement learning (RL) has allowed us to both understand biological brains and build successful artificial agents. However, in this opinion, we highlight open challenges for RL as a model of animal behaviour in natural environments. We ask how the external reward function is designed for biological systems, and how we can account for the context sensitivity of valuation. We summarise both old and new theories proposing that animals track current and desired internal states and seek to minimise the distance to a goal across multiple value dimensions. We suggest that this framework readily accounts for canonical phenomena observed in the fields of psychology, behavioural ecology, and economics, and recent findings from brain-imaging studies of value-guided decision-making

    Where does value come from?

    No full text
    The computational framework of reinforcement learning (RL) has allowed us to both understand biological brains and build successful artificial agents. However, in this opinion, we highlight open challenges for RL as a model of animal behaviour in natural environments. We ask how the external reward function is designed for biological systems, and how we can account for the context sensitivity of valuation. We summarise both old and new theories proposing that animals track current and desired internal states and seek to minimise the distance to a goal across multiple value dimensions. We suggest that this framework readily accounts for canonical phenomena observed in the fields of psychology, behavioural ecology, and economics, and recent findings from brain-imaging studies of value-guided decision-making

    Model sharing in the human medial temporal lobe

    No full text
    Effective planning involves knowing where different actions take us. However natural environments are rich and complex, leading to an exponential increase in memory demand as a plan grows in depth. One potential solution is to filter out features of the environment irrelevant to the task at hand. This enables a shared model of transition dynamics to be used for planning over a range of different input features. Here, we asked human participants (13 male, 16, female) to perform a sequential decision-making task, designed so that knowledge should be integrated independently of the input features (visual cues) present in one case but not in another. Participants efficiently switched between using a low (cue independent) and a high (cue specific) dimensional representation of state transitions. fMRI data identified the medial temporal lobe as a locus for learning state transitions. Within this region, multivariate patterns of BOLD responses as state associations changed (via trial-by-trial learning) were less correlated between trials with differing input features in the high compared to the low dimensional case, suggesting that these patterns switched between separable (specific to input features) and shared (invariant to input features) transition models. Finally, we show that transition models are updated more strongly following the receipt of positive compared to negative outcomes, a finding that challenges conventional theories of planning. Together, these findings propose a computational and neural account of how information relevant for planning can be shared and segmented in response to the vast array of contextual features we encounter in our world.<b>SIGNIFICANCE STATEMENT:</b>Effective planning involves maintaining an accurate model of which actions take us to which locations. But in a world awash with information, mapping actions to states with the right level of complexity is critical. Using a new decision-making "heist task" in conjunction with computational modelling and fMRI we show that patterns of BOLD responses in the medial temporal lobe - a brain region key for prospective planning - become less sensitive to the presence of visual features when these are irrelevant to the task at hand. By flexibly adapting the complexity of task state representations in this way, state-action mappings learned under one set of features can be used to plan in the presence of others

    Ventromedial prefrontal cortex encodes a latent estimate of cumulative reward.

    No full text
    Humans and other animals accumulate resources, or wealth, by making successive risky decisions. If and how risk attitudes vary with wealth remains an open question. Here humans accumulated reward by accepting or rejecting successive monetary gambles within arbitrarily defined temporal contexts. Risk preferences changed substantially toward risk aversion as reward accumulated within a context, and blood oxygen level dependent (BOLD) signals in the ventromedial prefrontal cortex (PFC) tracked the latent growth of cumulative economic outcomes. Risky behavior was captured by a computational model in which reward prompts an adaptive update to the function that links utilities to choices. These findings can be understood if humans have evolved economic decision policies that fail to maximize overall expected value but reduce variance in cumulative outcomes, thereby ensuring that resources remain above a critical survival threshold

    Ventromedial prefrontal cortex encodes a latent estimate of cumulative reward.

    No full text
    Humans and other animals accumulate resources, or wealth, by making successive risky decisions. If and how risk attitudes vary with wealth remains an open question. Here humans accumulated reward by accepting or rejecting successive monetary gambles within arbitrarily defined temporal contexts. Risk preferences changed substantially toward risk aversion as reward accumulated within a context, and blood oxygen level dependent (BOLD) signals in the ventromedial prefrontal cortex (PFC) tracked the latent growth of cumulative economic outcomes. Risky behavior was captured by a computational model in which reward prompts an adaptive update to the function that links utilities to choices. These findings can be understood if humans have evolved economic decision policies that fail to maximize overall expected value but reduce variance in cumulative outcomes, thereby ensuring that resources remain above a critical survival threshold

    Training discrimination diminishes maladaptive avoidance of innocuous stimuli in a fear conditioning paradigm

    No full text
    Anxiety disorders are the most common mental disorder worldwide. Although anxiety disorders differ in the nature of feared objects or situations, they share a common mechanism by which fear generalizes to related but innocuous objects, eliciting avoidance of objects and situations that pose no objective risk. This overgeneralization appears to be a crucial mechanism in the persistence of anxiety psychopathology. In this study we test whether an intervention that promotes discrimination learning reduces generalization of fear, in particular, harm expectancy and avoidance compared to an irrelevant (control) training. Healthy participants (N = 80) were randomly allocated to a training condition. Using a fear conditioning paradigm, participants first learned visual danger and safety signals (set 1). Baseline level of stimulus generalization was tested with ambiguous stimuli on a spectrum between the danger and safety signals. There were no differences between the training groups. Participants then received the stimulus discrimination training or a control training. After training, participants learned a new set of danger and safety signals (set 2), and the level of harm expectancy generalization and behavioural avoidance of ambiguous stimuli was tested. Although the training groups did not differ in fear generalization on a cognitive level (harm expectancy), the results showed a different pattern of avoidance of ambiguous stimuli, with the discrimination training group showing less avoidance of stimuli that resembled the safety signals. These results support the potential of interventions that promote discrimination learning in the treatment of anxiety disorders

    Training discrimination diminishes maladaptive avoidance of innocuous stimuli in a fear conditioning paradigm

    No full text
    Anxiety disorders are the most common mental disorder worldwide. Although anxiety disorders differ in the nature of feared objects or situations, they share a common mechanism by which fear generalizes to related but innocuous objects, eliciting avoidance of objects and situations that pose no objective risk. This overgeneralization appears to be a crucial mechanism in the persistence of anxiety psychopathology. In this study we test whether an intervention that promotes discrimination learning reduces generalization of fear, in particular, harm expectancy and avoidance compared to an irrelevant (control) training. Healthy participants (N = 80) were randomly allocated to a training condition. Using a fear conditioning paradigm, participants first learned visual danger and safety signals (set 1). Baseline level of stimulus generalization was tested with ambiguous stimuli on a spectrum between the danger and safety signals. There were no differences between the training groups. Participants then received the stimulus discrimination training or a control training. After training, participants learned a new set of danger and safety signals (set 2), and the level of harm expectancy generalization and behavioural avoidance of ambiguous stimuli was tested. Although the training groups did not differ in fear generalization on a cognitive level (harm expectancy), the results showed a different pattern of avoidance of ambiguous stimuli, with the discrimination training group showing less avoidance of stimuli that resembled the safety signals. These results support the potential of interventions that promote discrimination learning in the treatment of anxiety disorders
    corecore