Search CORE

1,664 research outputs found

Bandit Models of Human Behavior: Reward Processing in Mental Disorders

Author: A Dezfouli
AD Redish
AM Taylor
D Bouneffouf
D Bouneffouf
DC Perry
LE Hess
M Luman
MJ Frank
P Auer
P Auer
P Auer
TL Lai
TU Hauser
W Thompson
WR Thompson
WW Seeley
Publication venue
Publication date: 07/06/2017
Field of study

Drawing an inspiration from behavioral studies of human decision making, we propose here a general parametric framework for multi-armed bandit problem, which extends the standard Thompson Sampling approach to incorporate reward processing biases associated with several neurological and psychiatric conditions, including Parkinson's and Alzheimer's diseases, attention-deficit/hyperactivity disorder (ADHD), addiction, and chronic pain. We demonstrate empirically that the proposed parametric approach can often outperform the baseline Thompson Sampling on a variety of datasets. Moreover, from the behavioral modeling perspective, our parametric framework can be viewed as a first step towards a unifying computational model capturing reward processing abnormalities across multiple mental conditions.Comment: Conference on Artificial General Intelligence, AGI-1

arXiv.org e-Print Archive

Crossref

Simple threshold rules solve explore/exploit trade‐offs in a resource accumulation search task

Author: Baumann C.
Christian B.
Sang K.
Sutton R. S.
Todd P. M.
Zhang S.
Publication venue: 'Wiley'
Publication date: 01/02/2020
Field of study

How, and how well, do people switch between exploration and exploitation to search for and accumulate resources? We study the decision processes underlying such exploration/exploitation trade‐offs using a novel card selection task that captures the common situation of searching among multiple resources (e.g., jobs) that can be exploited without depleting. With experience, participants learn to switch appropriately between exploration and exploitation and approach optimal performance. We model participants' behavior on this task with random, threshold, and sampling strategies, and find that a linear decreasing threshold rule best fits participants' results. Further evidence that participants use decreasing threshold‐based strategies comes from reaction time differences between exploration and exploitation; however, participants themselves report non‐decreasing thresholds. Decreasing threshold strategies that “front‐load” exploration and switch quickly to exploitation are particularly effective in resource accumulation tasks, in contrast to optimal stopping problems like the Secretary Problem requiring longer exploration

Crossref

Warwick Research Archives Portal Repository

Satisficing in multi-armed bandit problems

Author: Leonard Naomi Ehrich
Reverdy Paul
Srivastava Vaibhav
Publication venue
Publication date: 19/12/2016
Field of study

Satisficing is a relaxation of maximizing and allows for less risky decision making in the face of uncertainty. We propose two sets of satisficing objectives for the multi-armed bandit problem, where the objective is to achieve reward-based decision-making performance above a given threshold. We show that these new problems are equivalent to various standard multi-armed bandit problems with maximizing objectives and use the equivalence to find bounds on performance. The different objectives can result in qualitatively different behavior; for example, agents explore their options continually in one case and only a finite number of times in another. For the case of Gaussian rewards we show an additional equivalence between the two sets of satisficing objectives that allows algorithms developed for one set to be applied to the other. We then develop variants of the Upper Credible Limit (UCL) algorithm that solve the problems with satisficing objectives and show that these modified UCL algorithms achieve efficient satisficing performance.Comment: To appear in IEEE Transactions on Automatic Contro

arXiv.org e-Print Archive

Princeton University Open Access Repository

Parameter estimation in softmax decision-making models with linear objective functions

Author: Leonard Naomi E.
Reverdy Paul
Publication venue
Publication date: 29/08/2015
Field of study

With an eye towards human-centered automation, we contribute to the development of a systematic means to infer features of human decision-making from behavioral data. Motivated by the common use of softmax selection in models of human decision-making, we study the maximum likelihood parameter estimation problem for softmax decision-making models with linear objective functions. We present conditions under which the likelihood function is convex. These allow us to provide sufficient conditions for convergence of the resulting maximum likelihood estimator and to construct its asymptotic distribution. In the case of models with nonlinear objective functions, we show how the estimator can be applied by linearizing about a nominal parameter value. We apply the estimator to fit the stochastic UCL (Upper Credible Limit) model of human decision-making to human subject data. We show statistically significant differences in behavior across related, but distinct, tasks.Comment: In pres

arXiv.org e-Print Archive

Princeton University Open Access Repository

Recommended from our members

A Cognitive Modeling Analysis of Risk in Sequential Choice Tasks

Author: Guan Maime
Publication venue: eScholarship, University of California
Publication date: 01/01/2019
Field of study

There exists a variety of instruments that assess risk propensity, or an individual's intrinsic tendency to be risk seeking. This thesis looks at four widely-studied cognitive tasks (the optimal stopping problem, the Balloon Analogue Risk Task, bandit problems, and a preferential choice gambling task) and three commonly used risk questionnaires (Risk Propensity Scale, Risk Taking Index, and Domain-Specific Risk-Taking Scale). Although these decision-making tasks and risk questionnaires have been studied extensively in isolation, there has been less research comparing measures of risk propensity across them. The motivation for examining the relationships between the tasks is that if an individual has a fundamental propensity to take risks, then this trait should be reflected in various questionnaires and cognitive tasks in which behavior is sensitive to risk. Within-subjects data was collected through Amazon Mechanical Turk from 56 participants. As measures of risk from the decision-making tasks, four cognitive models are implemented in which there are psychological variables that can be interpreted as risk propensity. Modeling results, based on Bayesian inferences about parameters and their correlations, show that people's risk behavior is consistent within tasks, but there is less evidence that the way people manage risk in each domain generalizes across tasks and questionnaires

eScholarship - University of California