9 research outputs found
Predicting human decision making in psychological tasks with recurrent neural networks
Unlike traditional time series, the action sequences of human decision making
usually involve many cognitive processes such as beliefs, desires, intentions
and theory of mind, i.e. what others are thinking. This makes predicting human
decision making challenging to be treated agnostically to the underlying
psychological mechanisms. We propose to use a recurrent neural network
architecture based on long short-term memory networks (LSTM) to predict the
time series of the actions taken by the human subjects at each step of their
decision making, the first application of such methods in this research domain.
In this study, we collate the human data from 8 published literature of the
Iterated Prisoner's Dilemma comprising 168,386 individual decisions and
postprocess them into 8,257 behavioral trajectories of 9 actions each for both
players. Similarly, we collate 617 trajectories of 95 actions from 10 different
published studies of Iowa Gambling Task experiments with healthy human
subjects. We train our prediction networks on the behavioral data from these
published psychological experiments of human decision making, and demonstrate a
clear advantage over the state-of-the-art methods in predicting human decision
making trajectories in both single-agent scenarios such as the Iowa Gambling
Task and multi-agent scenarios such as the Iterated Prisoner's Dilemma. In the
prediction, we observe that the weights of the top performers tends to have a
wider distribution, and a bigger bias in the LSTM networks, which suggests
possible interpretations for the distribution of strategies adopted by each
group
Online Learning in Iterated Prisoner's Dilemma to Mimic Human Behavior
Prisoner's Dilemma mainly treat the choice to cooperate or defect as an
atomic action. We propose to study online learning algorithm behavior in the
Iterated Prisoner's Dilemma (IPD) game, where we explored the full spectrum of
reinforcement learning agents: multi-armed bandits, contextual bandits and
reinforcement learning. We have evaluate them based on a tournament of iterated
prisoner's dilemma where multiple agents can compete in a sequential fashion.
This allows us to analyze the dynamics of policies learned by multiple
self-interested independent reward-driven agents, and also allows us study the
capacity of these algorithms to fit the human behaviors. Results suggest that
considering the current situation to make decision is the worst in this kind of
social dilemma game. Multiples discoveries on online learning behaviors and
clinical validations are stated.Comment: To the best of our knowledge, this is the first attempt to explore
the full spectrum of reinforcement learning agents (multi-armed bandits,
contextual bandits and reinforcement learning) in the sequential social
dilemma. This mental variants section supersedes and extends our work
arXiv:1706.02897 (MAB), arXiv:2005.04544 (CB) and arXiv:1906.11286 (RL) into
the multi-agent settin