12 research outputs found

    Learning to Allocate Limited Time to Decisions with Different Expected Outcomes

    No full text
    The goal of this article is to investigate how human participants allocate their limited time to decisions with different properties. We report the results of two behavioral experiments. In each trial of the experiments, the participant must accumulate noisy information to make a decision. The participants received positive and negative rewards for their correct and incorrect decisions, respectively. The stimulus was designed such that decisions based on more accumulated information were more accurate but took longer. Therefore, the total outcome that a participant could achieve during the limited experiments’ time depended on her “decision threshold”, the amount of information she needed to make a decision. In the first experiment, two types of trials were intermixed randomly: hard and easy. Crucially, the hard trials were associated with smaller positive and negative rewards than the easy trials. A cue presented at the beginning of each trial would indicate the type of the upcoming trial. The optimal strategy was to adopt a small decision threshold for hard trials. The results showed that several of the participants did not learn this simple strategy. We then investigated how the participants adjusted their decision threshold based on the feedback they received in each trial. To this end, we developed and compared 10 computational models for adjusting the decision threshold. The models differ in their assumptions on the shape of the decision thresholds and the way the feedback is used to adjust the decision thresholds. The results of Bayesian model comparison showed that a model with time-varying thresholds whose parameters are updated by a reinforcement learning algorithm is the most likely model. In the second experiment, the cues were not presented. We showed that the optimal strategy is to use a single time-decreasing decision threshold for all trials. The results of the computational modeling showed that the participants did not use this optimal strategy. Instead, they attempted to detect the difficulty of the trial first and then set their decision threshold accordingly

    The detour problem in a stochastic environment: Tolman revisited

    No full text
    We designed a grid world task to study human planning and re-planning behavior in an unknown stochastic environment. In our grid world, participants were asked to travel from a random starting point to a random goal position while maximizing their reward. Because they were not familiar with the environment, they needed to learn its characteristics from experience to plan optimally. Later in the task, we randomly blocked the optimal path to investigate whether and how people adjust their original plans to find a detour. To this end, we developed and compared 12 different models. These models were different on how they learned and represented the environment and how they planned to catch the goal. The majority of our participants were able to plan optimally. We also showed that people were capable of revising their plans when an unexpected event occurred. The result from the model comparison showed that the model-based reinforcement learning approach provided the best account for the data and outperformed heuristics in explaining the behavioral data in the re-planning trials

    Learning to maximize reward rate: a model based on semi-Markov decision processes

    No full text
    When animals have to make a number of decisions during a limited time interval, they face a fundamental problem: how much time they should spend on each decision in order to achieve the maximum possible total outcome. Deliberating more on one decision usually leads to more outcome but less time will remain for other decisions. In the framework of sequential sampling models, the question is how animals learn to set their decision threshold such that the total expected outcome achieved during a limited time is maximized.The aim of this paper is to provide a theoretical framework for answering this question. To this end, we consider an experimental design in which each trial can come from one of the several possible ``conditions. A condition specifies the difficulty of the trial, the reward, the penalty and so on. We show that to maximize the expected reward during a limited time, the subject should set a separate value of decision threshold for each condition. We propose a model of learning the optimal value of decision thresholds based on the theory of semi-Markov decision processes (SMDP). In our model, the experimental environment is modeled as an SMDP with each ``condition being a ``state and the value of decision thresholds being the ``actions taken in those states. The problem of finding the optimal decision thresholds then is cast as the stochastic optimal control problem of taking actions in each state in the corresponding SMDP such that the average reward rate is maximized. Our model utilizes a biologically plausible learning algorithm to solve this problem. The simulation results show that at the beginning of learning the model choses high values of decision threshold which lead to sub-optimal performance. With experience, however, the model learns to lower the value of decision thresholds till finally it finds the optimal values

    Effects of Methadone Maintenance Treatment on Decision-Making Processes in Heroin-Abusers: A Cognitive Modeling Analysis

    No full text
    A B S T R A C TIntroduction: Although decision-making processes have become a principal target of study among addiction researchers, few researches are published according to effects of different treatment methods on the cognitive processes underlying decision making up to now. Utilizing cognitive modeling method, in this paper we examine the effects of Methadone maintenance treatment (MMT) on cognitive processes underlying decision-making disorders in heroin-abusers. Methods: For this purpose, for the first time, we use the balloon analog risk task (BART) to assess the decision-making ability of heroin-abusers before and after treatment and compare it to the non heroin-dependent subjects. Results: Results demonstrate that heroin-abusers show more risky behavior than other groups. But, there is no difference between the performance of heroin-abusers after 6 months of MMT and control group. Modeling subjects’ behavior in BART reveals that poor performance in heroin-abusers is due to reward-dependency and insensitivity to evaluation. Discussion: Results show that 6 months of MMT decreases reward-dependency and increases sensitivity to evaluation

    Effects of Methadone Maintenance Treatment on Decision-Making Processes in Heroin-Abusers: A Cognitive Modeling Analysis

    No full text
    A B S T R A C TIntroduction: Although decision-making processes have become a principal target of study among addiction researchers, few researches are published according to effects of different treatment methods on the cognitive processes underlying decision making up to now. Utilizing cognitive modeling method, in this paper we examine the effects of Methadone maintenance treatment (MMT) on cognitive processes underlying decision-making disorders in heroin-abusers. Methods: For this purpose, for the first time, we use the balloon analog risk task (BART) to assess the decision-making ability of heroin-abusers before and after treatment and compare it to the non heroin-dependent subjects. Results: Results demonstrate that heroin-abusers show more risky behavior than other groups. But, there is no difference between the performance of heroin-abusers after 6 months of MMT and control group. Modeling subjects’ behavior in BART reveals that poor performance in heroin-abusers is due to reward-dependency and insensitivity to evaluation. Discussion: Results show that 6 months of MMT decreases reward-dependency and increases sensitivity to evaluation

    Quantum Inspired Reinforcement Learning in Changing Environment

    No full text
    Inspired by quantum theory and reinforcement learning, a new framework of learning in unknown probabilistic environment is proposed. Several simulated experiments are given; the results demonstrate the robustness of the new algorithm for some complex problems. Also we generalized the Grover algorithm to improve the rate of converging to an optimal path. in other words, the new generalized algorithm helps to increase the probability of selecting good actions with better weights\u27 adjustments. © 2013 World Scientific Publishing Company
    corecore