1,749 research outputs found

    Vicarious Reinforcement in Rhesus Macaques (Macaca Mulatta)

    Get PDF
    What happens to others profoundly influences our own behavior. Such other-regarding outcomes can drive observational learning, as well as motivate cooperation, charity, empathy, and even spite. Vicarious reinforcement may serve as one of the critical mechanisms mediating the influence of other-regarding outcomes on behavior and decision-making in groups. Here we show that rhesus macaques spontaneously derive vicarious reinforcement from observing rewards given to another monkey, and that this reinforcement can motivate them to subsequently deliver or withhold rewards from the other animal. We exploited Pavlovian and instrumental conditioning to associate rewards to self (M1) and/or rewards to another monkey (M2) with visual cues. M1s made more errors in the instrumental trials when cues predicted reward to M2 compared to when cues predicted reward to M1, but made even more errors when cues predicted reward to no one. In subsequent preference tests between pairs of conditioned cues, M1s preferred cues paired with reward to M2 over cues paired with reward to no one. By contrast, M1s preferred cues paired with reward to self over cues paired with reward to both monkeys simultaneously. Rates of attention to M2 strongly predicted the strength and valence of vicarious reinforcement. These patterns of behavior, which were absent in non-social control trials, are consistent with vicarious reinforcement based upon sensitivity to observed, or counterfactual, outcomes with respect to another individual. Vicarious reward may play a critical role in shaping cooperation and competition, as well as motivating observational learning and group coordination in rhesus macaques, much as it does in humans. We propose that vicarious reinforcement signals mediate these behaviors via homologous neural circuits involved in reinforcement learning and decision-making

    Optimizing the depth and the direction of prospective planning using information values

    Get PDF
    Evaluating the future consequences of actions is achievable by simulating a mental search tree into the future. Expanding deep trees, however, is computationally taxing. Therefore, machines and humans use a plan-until-habit scheme that simulates the environment up to a limited depth and then exploits habitual values as proxies for consequences that may arise in the future. Two outstanding questions in this scheme are “in which directions the search tree should be expanded?”, and “when should the expansion stop?”. Here we propose a principled solution to these questions based on a speed/accuracy tradeoff: deeper expansion in the appropriate directions leads to more accurate planning, but at the cost of slower decision-making. Our simulation results show how this algorithm expands the search tree effectively and efficiently in a grid-world environment. We further show that our algorithm can explain several behavioral patterns in animals and humans, namely the effect of time-pressure on the depth of planning, the effect of reward magnitudes on the direction of planning, and the gradual shift from goal-directed to habitual behavior over the course of training. The algorithm also provides several predictions testable in animal/human experiments

    Intrinsically motivated oculomotor exploration guided by uncertainty reduction and conditioned reinforcement in non-human primates

    Get PDF
    Intelligent animals have a high degree of curiosity – the intrinsic desire to know – but the mechanisms of curiosity are poorly understood. A key open question pertains to the internal valuation systems that drive curiosity. What are the cognitive and emotional factors that motivate animals to seek information when this is not reinforced by instrumental rewards? Using a novel oculomotor paradigm, combined with reinforcement learning (RL) simulations, we show that monkeys are intrinsically motivated to search for and look at reward-predictive cues, and that their intrinsic motivation is shaped by a desire to reduce uncertainty, a desire to obtain conditioned reinforcement from positive cues, and individual variations in decision strategy and the cognitive costs of acquiring information. The results suggest that free-viewing oculomotor behavior reveals cognitive and emotional factors underlying the curiosity driven sampling of information

    A Temporal Information-Theoretic Model of Suboptimal Choice

    Get PDF
    Humans and animals often make decisions not in their long-term best interest. In one example, called suboptimal choice, pigeons sacrifice food for food-predictive stimuli. The study of suboptimal choice can reveal insights into the role of reward-predictive stimuli in maladaptive decision-making that characterizes numerous behavioral disorders. However, there is currently little evidence that rats engage in suboptimal choice, thereby raising questions about the species-generality of suboptimal choice. According to the temporal information-theoretic model, developed in Chapter 2, suboptimal choice emerges when pigeons pay more attention to the bits of temporal information conveyed by food-predictive stimuli than the rate of food delivery while making decisions. When there is a long delay to food, more attention is paid to food-predictive stimuli and suboptimal choice emerges in pigeons. Chapter 3 found that rats also engaged in suboptimal choice provided a sufficiently long delay to food. Further, when there is also a long delay to food-predictive stimuli, more attention is paid to the rate of food delivery and optimal choice emerges in pigeons. Chapter 4 found that suboptimal choice in rats was unaffected by delays to food-predictive stimuli. Thus, the processes that govern suboptimal choice are well-described by the temporal information- theoretic model of suboptimal choice for both rats and pigeons, though there might be species-differences in the variables that govern attention to food-predictive stimuli and food itself

    Trial Spacing and the Conditioned Motivational Effects of a Food-Predictive Cue

    Get PDF
    Stimuli in the environment can come to influence motivation and behavior through a process known as Pavlovian conditioning. During Pavlovian conditioning, stimuli in the environment come to predict the availability of a reward. Two different procedures are used to investigate how stimuli can modify ongoing behavior and reward consumption, known as Pavlovian-instrumental transfer and potentiated feeding, respectively. In other procedures that investigate how stimuli modify behavior, certain time intervals during Pavlovian training can influence how much a stimulus can modify behavior. One of those intervals is the time between the presentation of a stimulus and the associated reward. This interval has been shown to influence both Pavlovian-instrumental transfer and potentiated feeding. The other interval, the time between the end of one stimulus and the beginning of the next, has not been investigated in regards to the aforementioned procedures. The current study assessed how differences in this interval altered Pavlovian-instrumental transfer and potentiated feeding. This interval did not affect Pavlovian-instrumental transfer in the current experiment. Potentiated feeding was not able to be assessed

    Effects of Experimentally Induced Pain on Value-Based Decision-Making in Healthy Adults

    Get PDF
    Every day we make decisions that influence our well-being. Because of this, it is crucial that we make the most optimal decisions possible in order to minimize unnecessary loss or suffering. Some groups might be more vulnerable to making maladaptive choices, such as those suffering with chronic pain, which is associated with various cognitive impairments. As it currently stands there is not a clear link between pain and decision-making strategies in the literature, but there is however research showcasing that other aversive stimulus indeed do affect our reliance on the Pavlovian Bias regarding decision-making, suggesting that pain might influence it in a similar fashion as the other aversive stimuli. In this study we tested whether pain would be a modulator of the degree of Pavlovian bias in (N = 50) healthy Norwegian-speaking adults. We developed a protocol for safely and effectively inducing tonic heat pain and used this protocol in parallel with an orthogonalized Go/NoGo value-based decision-making card-game. The game consisted of 5 blocks, where block 2 and 4 was paired with a manipulation of either a painful or warm stimulation. We found that pain overall had no effect on task performance accuracy, but there was some indication that pain increased Pavlovian bias in the aversive domain. Although this effect was not very strong, it could be stronger in patients suffering with long-term (chronic) pain, leading them to make more maladaptive decisions in everyday life. Future studies should try to replicate the findings detailed in this thesis with a larger and more diverse sample
    • …
    corecore