48 research outputs found

    Reduced risk-seeking in chimpanzees in a zero-outcome game

    Get PDF
    A key component of economic decisions is the integration of information about reward outcomes and probabilities in selecting between competing options. In many species, risky choice is influenced by the magnitude of available outcomes, probability of success and the possibility of extreme outcomes. Chimpanzees are generally regarded to be risk-seeking. In this study, we examined two aspects of chimpanzees' risk preferences: first, whether setting the value of the non-preferred outcome of a risky option to zero changes chimpanzees’ risk preferences, and second, whether individual risk preferences are stable across two different measures. Across two experiments, we found chimpanzees (Pan troglodytes, n = 23) as a group to be risk-neutral to risk-avoidant with highly stable individual risk preferences. We discuss how the possibility of going empty-handed might reduce chimpanzees' risk-seeking relative to previous studies. This malleability in risk preferences as a function of experimental parameters and individual differences raises interesting questions about whether it is appropriate or helpful to categorize a species as a whole as risk-seeking or risk-avoidant. This article is part of the theme issue ‘Existence and prevalence of economic behaviours among non-human primates’

    Impact of the "when the fun stops, stop" gambling message on online gambling behaviour: a randomised, online experimental study

    Get PDF
    Background: Safer gambling messages are a common freedom-preserving method of protecting individuals from gambling-related harm. Yet, there is little independent and rigorous evidence assessing the effectiveness of safer gambling messages. In our study, we aimed to test the effect of the historically most commonly-used UK safer gambling message on concurrent gambling behaviour of people who gamble in the UK. / Methods: In this study, three preregistered, incentivised, and randomised online experiments, testing the UK's “when the fun stops, stop” message, were carried out via the crowdsourcing platform Prolific. Adults based in the UK who had previously participated in the gambling activities relevant to each experiment were eligible to participate. Experiments 1 and 3 involved bets on real soccer events, and experiment 2 used a commercially available online roulette game. Safer gambling message presence was varied between participants in each experiment. In experiment 2, exposed participants could be shown either a yellow or a black-and-white version of the safer gambling message. Participants were provided with a monetary endowment with which they were allowed to bet. Any of this money not bet was afterwards paid to participants as a bonus, in addition to the payouts from any winning bets. In experiment 2 participants had the opportunity to re-wager any winnings from the roulette game. The primary outcome in experiment 1 was participants’ decisions to accept (or reject) a series of football bets, which varied in their specificity (and payoffs), and the primary outcomes of experiments 2 and 3 were the proportion of available funds bet, which were defined as the total amount of money bet by a participant out of the total that could have been bet. / Findings: Participants for all three experiments were recruited between May 17, 2019, and Oct 17, 2020. Of the 506 participants in experiment 1, 41·3% of available bets were made by the 254 participants in the gambling message condition, which was not significantly different (p=0·15, odds ratio 1·22 [95% CI 0·93 to 1·61]) to the 37·8% of available bets made by the 252 participants in the control condition. In experiment 2, the only credible difference between conditions was that the 501 participants in the condition with the yellow version of the gambling message bet 3·64% (95% Bayesian credibility interval 0·00% to 7·27%) more of available funds left over than the 499 participants in the control condition. There were no credible differences between the bets made by the 500 participants in the black-and-white gambling message condition and the other conditions. In experiment 3, there were no credible differences between the 502 participants in the gambling message condition and the 501 participants in the control condition, with the largest effect being a 5·87% (95% Bayesian credibility interval –1·44% to 13·20%) increase in the probability of betting everything in the gambling message condition. / Interpretation: In our study, no evidence was found for a protective effect of the most common UK safer gambling message. Alternative interventions should be considered as part of an evidence-based public health approach to reducing gambling-related harm. / Funding: University of Warwick, British Academy and Leverhume, Swiss National Science Foundation

    Belief formation in a signaling game without common prior: an experiment

    Get PDF
    Using belief elicitation, the paper investigates the process of belief formation and evolution in a signaling game in which a common prior is not induced. Both prior and posterior beliefs of Receivers about Senders' types are elicited, as well as beliefs of Senders about Receivers' strategies. In the experiment, subjects often start with diffuse uniform beliefs and update them in view of observations. However, the speed of updating is influenced by the strength of initial beliefs. An interesting result is that beliefs about the prior distribution of types are updated slower than posterior beliefs, which incorporate Senders' strategies. In the medium run, for some specifications of game parameters, this leads to outcomes being significantly different from the outcomes of the game in which a common prior is induced. It is also shown that elicitation of beliefs does not considerably change the pattern of play in this game

    Biases in the Explore-Exploit Tradeoff in Addictions: The Role of Avoidance of Uncertainty.

    Get PDF
    We focus on exploratory decisions across disorders of compulsivity, a potential dimensional construct for the classification of mental disorders. Behaviors associated with the pathological use of alcohol or food, in alcohol use disorders (AUD) or binge-eating disorder (BED), suggest a disturbance in explore-exploit decision-making, whereby strategic exploratory decisions in an attempt to improve long-term outcomes may diminish in favor of more repetitive or exploitatory choices. We compare exploration vs exploitation across disorders of natural (obesity with and without BED) and drug rewards (AUD). We separately acquired resting state functional MRI data using a novel multi-echo planar imaging sequence and independent components analysis from healthy individuals to assess the neural correlates underlying exploration. Participants with AUD showed reduced exploratory behavior across gain and loss environments, leading to lower-yielding exploitatory choices. Obese subjects with and without BED did not differ from healthy volunteers but when compared with each other or to AUD subjects, BED had enhanced exploratory behaviors particularly in the loss domain. All subject groups had decreased exploration or greater uncertainty avoidance to losses compared with rewards. More exploratory decisions in the context of reward were associated with frontal polar and ventral striatal connectivity. For losses, exploration was associated with frontal polar and precuneus connectivity. We further implicate the relevance and dimensionality of constructs of compulsivity across disorders of both natural and drug rewards.The study was funded by the Wellcome Trust Fellowship grant for VV (093705/Z/10/Z) and Cambridge NIHR Biomedical Research Centre. VV and NAH are Wellcome Trust (WT) intermediate Clinical Fellows. LSM is in receipt of an MRC studentship. The BCNI is supported by a WT and MRC grant. MF is funded by NIMH and NSF grants and is consultant for Hoffman LaRoche pharmaceuticals. The remaining authors declare no competing financial interests.This is the final version of the article. It first appeared from NPG via http://dx.doi.org/10.1038/npp.2015.20

    Temporal-Difference Reinforcement Learning with Distributed Representations

    Get PDF
    Temporal-difference (TD) algorithms have been proposed as models of reinforcement learning (RL). We examine two issues of distributed representation in these TD algorithms: distributed representations of belief and distributed discounting factors. Distributed representation of belief allows the believed state of the world to distribute across sets of equivalent states. Distributed exponential discounting factors produce hyperbolic discounting in the behavior of the agent itself. We examine these issues in the context of a TD RL model in which state-belief is distributed over a set of exponentially-discounting “micro-Agents”, each of which has a separate discounting factor (γ). Each µAgent maintains an independent hypothesis about the state of the world, and a separate value-estimate of taking actions within that hypothesized state. The overall agent thus instantiates a flexible representation of an evolving world-state. As with other TD models, the value-error (δ) signal within the model matches dopamine signals recorded from animals in standard conditioning reward-paradigms. The distributed representation of belief provides an explanation for the decrease in dopamine at the conditioned stimulus seen in overtrained animals, for the differences between trace and delay conditioning, and for transient bursts of dopamine seen at movement initiation. Because each µAgent also includes its own exponential discounting factor, the overall agent shows hyperbolic discounting, consistent with behavioral experiments

    Grid Cells, Place Cells, and Geodesic Generalization for Spatial Reinforcement Learning

    Get PDF
    Reinforcement learning (RL) provides an influential characterization of the brain's mechanisms for learning to make advantageous choices. An important problem, though, is how complex tasks can be represented in a way that enables efficient learning. We consider this problem through the lens of spatial navigation, examining how two of the brain's location representations—hippocampal place cells and entorhinal grid cells—are adapted to serve as basis functions for approximating value over space for RL. Although much previous work has focused on these systems' roles in combining upstream sensory cues to track location, revisiting these representations with a focus on how they support this downstream decision function offers complementary insights into their characteristics. Rather than localization, the key problem in learning is generalization between past and present situations, which may not match perfectly. Accordingly, although neural populations collectively offer a precise representation of position, our simulations of navigational tasks verify the suggestion that RL gains efficiency from the more diffuse tuning of individual neurons, which allows learning about rewards to generalize over longer distances given fewer training experiences. However, work on generalization in RL suggests the underlying representation should respect the environment's layout. In particular, although it is often assumed that neurons track location in Euclidean coordinates (that a place cell's activity declines “as the crow flies” away from its peak), the relevant metric for value is geodesic: the distance along a path, around any obstacles. We formalize this intuition and present simulations showing how Euclidean, but not geodesic, representations can interfere with RL by generalizing inappropriately across barriers. Our proposal that place and grid responses should be modulated by geodesic distances suggests novel predictions about how obstacles should affect spatial firing fields, which provides a new viewpoint on data concerning both spatial codes

    An Imperfect Dopaminergic Error Signal Can Drive Temporal-Difference Learning

    Get PDF
    An open problem in the field of computational neuroscience is how to link synaptic plasticity to system-level learning. A promising framework in this context is temporal-difference (TD) learning. Experimental evidence that supports the hypothesis that the mammalian brain performs temporal-difference learning includes the resemblance of the phasic activity of the midbrain dopaminergic neurons to the TD error and the discovery that cortico-striatal synaptic plasticity is modulated by dopamine. However, as the phasic dopaminergic signal does not reproduce all the properties of the theoretical TD error, it is unclear whether it is capable of driving behavior adaptation in complex tasks. Here, we present a spiking temporal-difference learning model based on the actor-critic architecture. The model dynamically generates a dopaminergic signal with realistic firing rates and exploits this signal to modulate the plasticity of synapses as a third factor. The predictions of our proposed plasticity dynamics are in good agreement with experimental results with respect to dopamine, pre- and post-synaptic activity. An analytical mapping from the parameters of our proposed plasticity dynamics to those of the classical discrete-time TD algorithm reveals that the biological constraints of the dopaminergic signal entail a modified TD algorithm with self-adapting learning parameters and an adapting offset. We show that the neuronal network is able to learn a task with sparse positive rewards as fast as the corresponding classical discrete-time TD algorithm. However, the performance of the neuronal network is impaired with respect to the traditional algorithm on a task with both positive and negative rewards and breaks down entirely on a task with purely negative rewards. Our model demonstrates that the asymmetry of a realistic dopaminergic signal enables TD learning when learning is driven by positive rewards but not when driven by negative rewards
    corecore