1,664 research outputs found
Bandit Models of Human Behavior: Reward Processing in Mental Disorders
Drawing an inspiration from behavioral studies of human decision making, we
propose here a general parametric framework for multi-armed bandit problem,
which extends the standard Thompson Sampling approach to incorporate reward
processing biases associated with several neurological and psychiatric
conditions, including Parkinson's and Alzheimer's diseases,
attention-deficit/hyperactivity disorder (ADHD), addiction, and chronic pain.
We demonstrate empirically that the proposed parametric approach can often
outperform the baseline Thompson Sampling on a variety of datasets. Moreover,
from the behavioral modeling perspective, our parametric framework can be
viewed as a first step towards a unifying computational model capturing reward
processing abnormalities across multiple mental conditions.Comment: Conference on Artificial General Intelligence, AGI-1
Simple threshold rules solve explore/exploit tradeâoffs in a resource accumulation search task
How, and how well, do people switch between exploration and exploitation to search for and accumulate resources? We study the decision processes underlying such exploration/exploitation tradeâoffs using a novel card selection task that captures the common situation of searching among multiple resources (e.g., jobs) that can be exploited without depleting. With experience, participants learn to switch appropriately between exploration and exploitation and approach optimal performance. We model participants' behavior on this task with random, threshold, and sampling strategies, and find that a linear decreasing threshold rule best fits participants' results. Further evidence that participants use decreasing thresholdâbased strategies comes from reaction time differences between exploration and exploitation; however, participants themselves report nonâdecreasing thresholds. Decreasing threshold strategies that âfrontâloadâ exploration and switch quickly to exploitation are particularly effective in resource accumulation tasks, in contrast to optimal stopping problems like the Secretary Problem requiring longer exploration
Satisficing in multi-armed bandit problems
Satisficing is a relaxation of maximizing and allows for less risky decision
making in the face of uncertainty. We propose two sets of satisficing
objectives for the multi-armed bandit problem, where the objective is to
achieve reward-based decision-making performance above a given threshold. We
show that these new problems are equivalent to various standard multi-armed
bandit problems with maximizing objectives and use the equivalence to find
bounds on performance. The different objectives can result in qualitatively
different behavior; for example, agents explore their options continually in
one case and only a finite number of times in another. For the case of Gaussian
rewards we show an additional equivalence between the two sets of satisficing
objectives that allows algorithms developed for one set to be applied to the
other. We then develop variants of the Upper Credible Limit (UCL) algorithm
that solve the problems with satisficing objectives and show that these
modified UCL algorithms achieve efficient satisficing performance.Comment: To appear in IEEE Transactions on Automatic Contro
Parameter estimation in softmax decision-making models with linear objective functions
With an eye towards human-centered automation, we contribute to the
development of a systematic means to infer features of human decision-making
from behavioral data. Motivated by the common use of softmax selection in
models of human decision-making, we study the maximum likelihood parameter
estimation problem for softmax decision-making models with linear objective
functions. We present conditions under which the likelihood function is convex.
These allow us to provide sufficient conditions for convergence of the
resulting maximum likelihood estimator and to construct its asymptotic
distribution. In the case of models with nonlinear objective functions, we show
how the estimator can be applied by linearizing about a nominal parameter
value. We apply the estimator to fit the stochastic UCL (Upper Credible Limit)
model of human decision-making to human subject data. We show statistically
significant differences in behavior across related, but distinct, tasks.Comment: In pres
Recommended from our members
A Cognitive Modeling Analysis of Risk in Sequential Choice Tasks
There exists a variety of instruments that assess risk propensity, or an individual's intrinsic tendency to be risk seeking. This thesis looks at four widely-studied cognitive tasks (the optimal stopping problem, the Balloon Analogue Risk Task, bandit problems, and a preferential choice gambling task) and three commonly used risk questionnaires (Risk Propensity Scale, Risk Taking Index, and Domain-Specific Risk-Taking Scale). Although these decision-making tasks and risk questionnaires have been studied extensively in isolation, there has been less research comparing measures of risk propensity across them. The motivation for examining the relationships between the tasks is that if an individual has a fundamental propensity to take risks, then this trait should be reflected in various questionnaires and cognitive tasks in which behavior is sensitive to risk. Within-subjects data was collected through Amazon Mechanical Turk from 56 participants. As measures of risk from the decision-making tasks, four cognitive models are implemented in which there are psychological variables that can be interpreted as risk propensity. Modeling results, based on Bayesian inferences about parameters and their correlations, show that people's risk behavior is consistent within tasks, but there is less evidence that the way people manage risk in each domain generalizes across tasks and questionnaires
- âŠ