203 research outputs found

    Reinforcement Learning Explains Conditional Cooperation and Its Moody Cousin

    Get PDF
    Direct reciprocity, or repeated interaction, is a main mechanism to sustain cooperation under social dilemmas involving two individuals. For larger groups and networks, which are probably more relevant to understanding and engineering our society, experiments employing repeated multiplayer social dilemma games have suggested that humans often show conditional cooperation behavior and its moody variant. Mechanisms underlying these behaviors largely remain unclear. Here we provide a proximate account for this behavior by showing that individuals adopting a type of reinforcement learning, called aspiration learning, phenomenologically behave as conditional cooperator. By definition, individuals are satisfied if and only if the obtained payoff is larger than a fixed aspiration level. They reinforce actions that have resulted in satisfactory outcomes and anti-reinforce those yielding unsatisfactory outcomes. The results obtained in the present study are general in that they explain extant experimental results obtained for both so-called moody and non-moody conditional cooperation, prisoner's dilemma and public goods games, and well-mixed groups and networks. Different from the previous theory, individuals are assumed to have no access to information about what other individuals are doing such that they cannot explicitly use conditional cooperation rules. In this sense, myopic aspiration learning in which the unconditional propensity of cooperation is modulated in every discrete time step explains conditional behavior of humans. Aspiration learners showing (moody) conditional cooperation obeyed a noisy GRIM-like strategy. This is different from the Pavlov, a reinforcement learning strategy promoting mutual cooperation in two-player situations

    Reinforcement learning accounts for moody conditional cooperation behavior:experimental results

    Get PDF
    In social dilemma games, human participants often show conditional cooperation (CC) behavior or its variant called moody conditional cooperation (MCC), with which they basically tend to cooperate when many other peers have previously cooperated. Recent computational studies showed that CC and MCC behavioral patterns could be explained by reinforcement learning. In the present study, we use a repeated multiplayer prisoner’s dilemma game and the repeated public goods game played by human participants to examine whether MCC is observed across different types of game and the possibility that reinforcement learning explains observed behavior. We observed MCC behavior in both games, but the MCC that we observed was different from that observed in the past experiments. In the present study, whether or not a focal participant cooperated previously affected the overall level of cooperation, instead of changing the tendency of cooperation in response to cooperation of other participants in the previous time step. We found that, across different conditions, reinforcement learning models were approximately as accurate as a MCC model in describing the experimental results. Consistent with the previous computational studies, the present results suggest that reinforcement learning may be a major proximate mechanism governing MCC behavior

    The emergence of altruism as a social norm

    Get PDF
    Expectations, exerting influence through social norms, are a very strong candidate to explain how complex societies function. In the Dictator game (DG), people expect generous behavior from others even when they cannot enforce any sharing of the pie. Here we assume that people donate following their expectations, and that they update their expectation after playing a DG by reinforcement learning to construct a model that explains the main experimental results in the DG. Full agreement with the experimental results is reached when some degree of mismatch between expectations and donations is added into the model. These results are robust against the presence of envious agents, but affected if we introduce selfish agents that do not update their expectations. Our results point to social norms being on the basis of the generous behavior observed in the DG and also to the wide applicability of reinforcement learning to explain many strategic interactions

    The emergence of altruism as a social norm

    Get PDF
    Expectations, exerting influence through social norms, are a very strong candidate to explain how complex societies function. In the Dictator game (DG), people expect generous behavior from others even when they cannot enforce any sharing of the pie. Here we assume that people donate following their expectations, and that they update their expectation after playing a DG by reinforcement learning to construct a model that explains the main experimental results in the DG. Full agreement with the experimental results is reached when some degree of mismatch between expectations and donations is added into the model. These results are robust against the presence of envious agents, but affected if we introduce selfish agents that do not update their expectations. Our results point to social norms being on the basis of the generous behavior observed in the DG and also to the wide applicability of reinforcement learning to explain many strategic interactions

    Reinforcement learning account of network reciprocity

    Get PDF
    Evolutionary game theory predicts that cooperation in social dilemma games is promoted when agents are connected as a network. However, when networks are fixed over time, humans do not necessarily show enhanced mutual cooperation. Here we show that reinforcement learning (specifically, the so-called Bush-Mosteller model) approximately explains the experimentally observed network reciprocity and the lack thereof in a parameter region spanned by the benefit-to-cost ratio and the node's degree. Thus, we significantly extend previously obtained numerical results

    Cooperation transitions in social games induced by aspiration-driven players

    Full text link
    Cooperation and defection are social traits whose evolutionary origin is still unresolved. Recent behavioral experiments with humans suggested that strategy changes are driven mainly by the individuals' expectations and not by imitation. This work theoretically analyzes and numerically explores an aspiration-driven strategy updating in a well-mixed population playing games. The payoffs of the game matrix and the aspiration are condensed into just two parameters that allow a comprehensive description of the dynamics. We find continuous and abrupt transitions in the cooperation density with excellent agreement between theory and the Gillespie simulations. Under strong selection, the system can display several levels of steady cooperation or get trapped into absorbing states. These states are still relevant for experiments even when irrational choices are made due to their prolonged relaxation times. Finally, we show that for the particular case of the Prisoner Dilemma, where defection is the dominant strategy under imitation mechanisms, the self-evaluation update instead favors cooperation nonlinearly with the level of aspiration. Thus, our work provides insights into the distinct role between imitation and self-evaluation with no learning dynamics.Comment: 11 pages, 9 figures; Correction of typos in the metadat

    Intrinsic fluctuations of reinforcement learning promote cooperation

    Get PDF
    In this work, we ask for and answer what makes classical reinforcement learning cooperative. Cooperating in social dilemma situations is vital for animals, humans, and machines. While evolutionary theory revealed a range of mechanisms promoting cooperation, the conditions under which agents learn to cooperate are contested. Here, we demonstrate which and how individual elements of the multi-agent learning setting lead to cooperation. Specifically, we consider the widely used temporal-difference reinforcement learning algorithm with epsilon-greedy exploration in the classic environment of an iterated Prisoner's dilemma with one-period memory. Each of the two learning agents learns a strategy that conditions the following action choices on both agents' action choices of the last round. We find that next to a high caring for future rewards, a low exploration rate, and a small learning rate, it is primarily intrinsic stochastic fluctuations of the reinforcement learning process which double the final rate of cooperation to up to 80\%. Thus, inherent noise is not a necessary evil of the iterative learning process. It is a critical asset for the learning of cooperation. However, we also point out the trade-off between a high likelihood of cooperative behavior and achieving this in a reasonable amount of time. Our findings are relevant for purposefully designing cooperative algorithms and regulating undesired collusive effects.Comment: 9 pages, 4 figure
    • …
    corecore