203 research outputs found
Reinforcement Learning Explains Conditional Cooperation and Its Moody Cousin
Direct reciprocity, or repeated interaction, is a main mechanism to sustain cooperation under social dilemmas involving two individuals. For larger groups and networks, which are probably more relevant to understanding and engineering our society, experiments employing repeated multiplayer social dilemma games have suggested that humans often show conditional cooperation behavior and its moody variant. Mechanisms underlying these behaviors largely remain unclear. Here we provide a proximate account for this behavior by showing that individuals adopting a type of reinforcement learning, called aspiration learning, phenomenologically behave as conditional cooperator. By definition, individuals are satisfied if and only if the obtained payoff is larger than a fixed aspiration level. They reinforce actions that have resulted in satisfactory outcomes and anti-reinforce those yielding unsatisfactory outcomes. The results obtained in the present study are general in that they explain extant experimental results obtained for both so-called moody and non-moody conditional cooperation, prisoner's dilemma and public goods games, and well-mixed groups and networks. Different from the previous theory, individuals are assumed to have no access to information about what other individuals are doing such that they cannot explicitly use conditional cooperation rules. In this sense, myopic aspiration learning in which the unconditional propensity of cooperation is modulated in every discrete time step explains conditional behavior of humans. Aspiration learners showing (moody) conditional cooperation obeyed a noisy GRIM-like strategy. This is different from the Pavlov, a reinforcement learning strategy promoting mutual cooperation in two-player situations
Reinforcement learning accounts for moody conditional cooperation behavior:experimental results
In social dilemma games, human participants often show conditional cooperation (CC) behavior or its variant called moody conditional cooperation (MCC), with which they basically tend to cooperate when many other peers have previously cooperated. Recent computational studies showed that CC and MCC behavioral patterns could be explained by reinforcement learning. In the present study, we use a repeated multiplayer prisoner’s dilemma game and the repeated public goods game played by human participants to examine whether MCC is observed across different types of game and the possibility that reinforcement learning explains observed behavior. We observed MCC behavior in both games, but the MCC that we observed was different from that observed in the past experiments. In the present study, whether or not a focal participant cooperated previously affected the overall level of cooperation, instead of changing the tendency of cooperation in response to cooperation of other participants in the previous time step. We found that, across different conditions, reinforcement learning models were approximately as accurate as a MCC model in describing the experimental results. Consistent with the previous computational studies, the present results suggest that reinforcement learning may be a major proximate mechanism governing MCC behavior
The emergence of altruism as a social norm
Expectations, exerting influence through social norms, are a very strong candidate to explain how complex societies function. In the Dictator game (DG), people expect generous behavior from others even when they cannot enforce any sharing of the pie. Here we assume that people donate following their expectations, and that they update their expectation after playing a DG by reinforcement learning to construct a model that explains the main experimental results in the DG. Full agreement with the experimental results is reached when some degree of mismatch between expectations and donations is added into the model. These results are robust against the presence of envious agents, but affected if we introduce selfish agents that do not update their expectations. Our results point to social norms being on the basis of the generous behavior observed in the DG and also to the wide applicability of reinforcement learning to explain many strategic interactions
The emergence of altruism as a social norm
Expectations, exerting influence through social norms, are a very strong candidate to explain how complex societies function. In the Dictator game (DG), people expect generous behavior from others even when they cannot enforce any sharing of the pie. Here we assume that people donate following their expectations, and that they update their expectation after playing a DG by reinforcement learning to construct a model that explains the main experimental results in the DG. Full agreement with the experimental results is reached when some degree of mismatch between expectations and donations is added into the model. These results are robust against the presence of envious agents, but affected if we introduce selfish agents that do not update their expectations. Our results point to social norms being on the basis of the generous behavior observed in the DG and also to the wide applicability of reinforcement learning to explain many strategic interactions
Reinforcement learning account of network reciprocity
Evolutionary game theory predicts that cooperation in social dilemma games is promoted when agents are connected as a network. However, when networks are fixed over time, humans do not necessarily show enhanced mutual cooperation. Here we show that reinforcement learning (specifically, the so-called Bush-Mosteller model) approximately explains the experimentally observed network reciprocity and the lack thereof in a parameter region spanned by the benefit-to-cost ratio and the node's degree. Thus, we significantly extend previously obtained numerical results
Cooperation transitions in social games induced by aspiration-driven players
Cooperation and defection are social traits whose evolutionary origin is
still unresolved. Recent behavioral experiments with humans suggested that
strategy changes are driven mainly by the individuals' expectations and not by
imitation. This work theoretically analyzes and numerically explores an
aspiration-driven strategy updating in a well-mixed population playing games.
The payoffs of the game matrix and the aspiration are condensed into just two
parameters that allow a comprehensive description of the dynamics. We find
continuous and abrupt transitions in the cooperation density with excellent
agreement between theory and the Gillespie simulations. Under strong selection,
the system can display several levels of steady cooperation or get trapped into
absorbing states. These states are still relevant for experiments even when
irrational choices are made due to their prolonged relaxation times. Finally,
we show that for the particular case of the Prisoner Dilemma, where defection
is the dominant strategy under imitation mechanisms, the self-evaluation update
instead favors cooperation nonlinearly with the level of aspiration. Thus, our
work provides insights into the distinct role between imitation and
self-evaluation with no learning dynamics.Comment: 11 pages, 9 figures; Correction of typos in the metadat
Intrinsic fluctuations of reinforcement learning promote cooperation
In this work, we ask for and answer what makes classical reinforcement
learning cooperative. Cooperating in social dilemma situations is vital for
animals, humans, and machines. While evolutionary theory revealed a range of
mechanisms promoting cooperation, the conditions under which agents learn to
cooperate are contested. Here, we demonstrate which and how individual elements
of the multi-agent learning setting lead to cooperation. Specifically, we
consider the widely used temporal-difference reinforcement learning algorithm
with epsilon-greedy exploration in the classic environment of an iterated
Prisoner's dilemma with one-period memory. Each of the two learning agents
learns a strategy that conditions the following action choices on both agents'
action choices of the last round. We find that next to a high caring for future
rewards, a low exploration rate, and a small learning rate, it is primarily
intrinsic stochastic fluctuations of the reinforcement learning process which
double the final rate of cooperation to up to 80\%. Thus, inherent noise is not
a necessary evil of the iterative learning process. It is a critical asset for
the learning of cooperation. However, we also point out the trade-off between a
high likelihood of cooperative behavior and achieving this in a reasonable
amount of time. Our findings are relevant for purposefully designing
cooperative algorithms and regulating undesired collusive effects.Comment: 9 pages, 4 figure
- …