9 research outputs found

    Predicting human decision making in psychological tasks with recurrent neural networks

    Full text link
    Unlike traditional time series, the action sequences of human decision making usually involve many cognitive processes such as beliefs, desires, intentions and theory of mind, i.e. what others are thinking. This makes predicting human decision making challenging to be treated agnostically to the underlying psychological mechanisms. We propose to use a recurrent neural network architecture based on long short-term memory networks (LSTM) to predict the time series of the actions taken by the human subjects at each step of their decision making, the first application of such methods in this research domain. In this study, we collate the human data from 8 published literature of the Iterated Prisoner's Dilemma comprising 168,386 individual decisions and postprocess them into 8,257 behavioral trajectories of 9 actions each for both players. Similarly, we collate 617 trajectories of 95 actions from 10 different published studies of Iowa Gambling Task experiments with healthy human subjects. We train our prediction networks on the behavioral data from these published psychological experiments of human decision making, and demonstrate a clear advantage over the state-of-the-art methods in predicting human decision making trajectories in both single-agent scenarios such as the Iowa Gambling Task and multi-agent scenarios such as the Iterated Prisoner's Dilemma. In the prediction, we observe that the weights of the top performers tends to have a wider distribution, and a bigger bias in the LSTM networks, which suggests possible interpretations for the distribution of strategies adopted by each group

    Online Learning in Iterated Prisoner's Dilemma to Mimic Human Behavior

    Full text link
    Prisoner's Dilemma mainly treat the choice to cooperate or defect as an atomic action. We propose to study online learning algorithm behavior in the Iterated Prisoner's Dilemma (IPD) game, where we explored the full spectrum of reinforcement learning agents: multi-armed bandits, contextual bandits and reinforcement learning. We have evaluate them based on a tournament of iterated prisoner's dilemma where multiple agents can compete in a sequential fashion. This allows us to analyze the dynamics of policies learned by multiple self-interested independent reward-driven agents, and also allows us study the capacity of these algorithms to fit the human behaviors. Results suggest that considering the current situation to make decision is the worst in this kind of social dilemma game. Multiples discoveries on online learning behaviors and clinical validations are stated.Comment: To the best of our knowledge, this is the first attempt to explore the full spectrum of reinforcement learning agents (multi-armed bandits, contextual bandits and reinforcement learning) in the sequential social dilemma. This mental variants section supersedes and extends our work arXiv:1706.02897 (MAB), arXiv:2005.04544 (CB) and arXiv:1906.11286 (RL) into the multi-agent settin