16,321 research outputs found

    Partner Selection for the Emergence of Cooperation in Multi-Agent Systems Using Reinforcement Learning

    Get PDF
    Social dilemmas have been widely studied to explain how humans are able to cooperate in society. Considerable effort has been invested in designing artificial agents for social dilemmas that incorporate explicit agent motivations that are chosen to favor coordinated or cooperative responses. The prevalence of this general approach points towards the importance of achieving an understanding of both an agent's internal design and external environment dynamics that facilitate cooperative behavior. In this paper, we investigate how partner selection can promote cooperative behavior between agents who are trained to maximize a purely selfish objective function. Our experiments reveal that agents trained with this dynamic learn a strategy that retaliates against defectors while promoting cooperation with other agents resulting in a prosocial society.Comment:

    Cooperation and Social Dilemmas with Reinforcement Learning

    Get PDF
    Cooperation between humans has been foundational for the development of civilisation and yet there are many questions about how it emerges from social interactions. As artificial agents begin to play a more significant role in our lives and are introduced into our societies, it is apparent that understanding the mechanisms of cooperation is important also for the design of next-generation multi-agent AI systems. Indeed, this is particularly important in the case of supporting cooperation between self-interested AI agents. In this thesis, we focus on the analysis of the application of mechanisms that are at the basis of human cooperation to the training of reinforcement learning agents. Human behaviour is a product of cultural norms, emotions and intuition amongst other things: we argue it is possible to use similar mechanisms to deal with the complexities of multi-agent cooperation. We outline the problem of cooperation in mixed-motive games, also known as social dilemmas, and we focus on the mechanisms of reputation dynamics and partner selection, two mechanisms that have been strongly linked to indirect reciprocity in Evolutionary Game Theory. A key point that we want to emphasise is the fact we assume no prior knowledge and explicit definition of strategies, which instead are fully learnt by the agents during the games. In our experimental evaluation, we demonstrate the benefits of applying these mechanisms to the training process of the agents, and we compare our findings with results presented in a variety of other disciplines, including Economics and Evolutionary Biology

    Neurophysiological correlates underlying social behavioural adjustment of conformity

    Get PDF
    [eng] Conformity is the act of changing one’s behaviour to adjust to other human beings. It is a crucial social adaptation that happens when people cooperate, where one sacrifices their own perception, expectations, or beliefs to reach convergence with another person. The aim of the present study was to increase the understanding of the neurophysiological underpinnings regarding the social behavioural adjustment of conformity. We start by introducing cooperation and how it is ingrained in human behaviour. Then we explore the different processes that the brain requires for the social behavioural adjustment of conformity. To engage in this social adaptation, a person needs a self-referenced learning mechanism based on a predictive model that helps them track the prediction errors from unexpected events. Also, the brain uses its monitoring and control systems to encode different value functions used in action selection. The use of different learning models in neuroscience, such as reinforcement learning (RL) algorithms, has been a success story identifying learning systems by means of the mapped activity of different regions in the brain. Importantly, experimental paradigms which has been used to study conformity have not been based in a social interaction setting and, hence, the results, cannot be used to explain an inherently social phenomenon. The main goal of the present thesis is to study the neurophysiological mechanisms underlying the social behavioural adjustment of conformity and its modulation with repeated interaction. To reach this goal, we have first designed a new experimental task where conformity appears spontaneously between two persons and in a reiterative way. This design exposes learning acquisition processes, which require iterative loops, as well as other cognitive control mechanisms such as feedback processing, value-based decision making and attention. The first study shows that people who previously cooperate increase their level of convergence and report a significantly more satisfying overall experience. In addition, participants learning on their counterparts’ behaviour can be explained using a RL algorithm as opposed to when they do not have previously cooperated. In the second study, we have studied the event-related potentials (ERP) and oscillatory power underlying conformity. ERP results show different levels of cognitive engagement that are associated to distinct levels of conformity. Also, time-frequency analysis shows evidence in theta, alpha and beta related to different functions such as cognitive control, attention and, also, reward processing, supporting the idea that convergence between dyads acts as a social reward. Finally, in the third study, we explored the intra- and inter- oscillatory connectivity between electrodes related to behavioural convergence. In intra-brain oscillatory connectivity coherence, we have found two different dynamics related to attention and executive functions in alpha. Also, we have found that the learning about peer’s behaviour as computed using a RL is mediated by theta oscillatory connectivity. Consequently, combined evidence from Study 2 and Study 3 suggests that both cognitive control and learning computations happening in the social behavioural adaptation of conformity are signalled in theta frequency band. The present work is one of the first studies describing, with credible evidence, that conformity, when this occurs willingly and spontaneously rather than induced, engages different brain activity underlying reward-guided learning, cognitive control, and attention.[spa] La conformidad es el acto de cambiar el comportamiento de uno a favor de ajustarnos a otros seres humanos. Se trata de una adaptación crucial que ocurre cuando la gente coopera, donde uno sacrifica su propia percepción, expectativas o creencias en aras de conseguir una convergencia con la otra persona. El objetivo del presente estudio ha sido tratar de aportar a la comprensión de las estructuras neurofisiológicas que soportan un ajuste social como el de la conformidad. En la primera parte de esta tesis comenzamos hablando de la cooperación y lo profundamente arraigada que está en nuestro comportamiento. Más tarde exploramos diferentes procesos que el cerebro requiere en el ajuste social de la conformidad. Así pues, para involucrarse en esta adaptación social, una persona requiere de un mecanismo de aprendizaje auto-referenciado basado en un modelo predictivo que le ayude a seguir el rastro de los errores de predicción que acompañan a los eventos inesperados. Además, el cerebro usa sus sistemas de control y predicción para codificar diferentes funciones de valor usadas en la selección de acción. El uso de diferentes modelos de aprendizaje en neurociencia, como los algoritmos de aprendizaje por refuerzo (RL), han sido una historia de éxito a la hora de identificar los sistemas de aprendizaje a través del mapeo de la actividad de diferentes regiones del cerebro. Es importante destacar que los paradigmas experimentales que se han usado para estudiar la conformidad no se han basado en entornos de interacción social y que, por lo tanto, sus resultados no pueden usarse para explicar un fenómeno inherentemente social. El objetivo principal de la presente tesis es el estudio de los mecanismos neurofisiológicos que fundamentan el comportamiento de ajuste social de la conformidad y su modulación con la interacción repetida. Para alcanzar este objetivo, primero hemos diseñado una nueva tarea experimental en la que la conformidad aparece de forma espontánea entre dos personas y, además, de forma reiterativa. Este diseño permite exponer tanto los procesos de adquisición del aprendizaje, que requieren de ciclos iterativos, así como otros mecanismos de control cognitivo tales como el procesamiento de la retroalimentación, las tomas de decisiones basadas en procesos valorativos y la atención. El primer estudio nos muestra que la gente que coopera previamente incrementa sus niveles de convergencia y reportan significativamente una experiencia generalmente más satisfactoria en el experimento. Adicionalmente, un modelo de RL nos explica que los participantes tratan de aprender del comportamiento de sus parejas en mayor medida si estos han cooperado previamente. En el segundo estudio, hemos estudiado los potenciales relacionados con eventos (ERP) y el poder de las oscilaciones que sustentan la conformidad. Los estudios de ERP muestran diferentes niveles de implicación cognitiva asociados con diferentes niveles de conformidad. Además, los análisis de tiempo-frecuencia muestran evidencia en theta, alfa y beta relacionados con diferentes funciones como el control cognitivo, la atención, y, también, el procesamiento de la recompensa, apoyando la idea de que la convergencia entre díadas actúa como una recompensa social. Finalmente, en el tercer estudio, exploramos la conectividad oscilatoria intra e inter entre electrodos que se pudieran relacionar con la conducta de convergencia. A propósito de la conectividad oscilatoria coherente intra, hemos hallado dos dinámicas relacionadas con la atención y las funciones ejecutivas en alfa. Asimismo, hemos encontrado que el aprendizaje de la conducta de la pareja computada a través de RL está mediada a través de la conectividad oscilatoria de theta. Consecuentemente, la evidencia combinada entre el estudio 2 y el estudio 3 sugiere que conjuntamente el control cognitivo y las computaciones de aprendizaje que ocurren en la conducta de adaptación social de la conformidad están relacionadas con la actividad de la banda de frecuencia theta. Este trabajo constituye uno de los primeros estudios que describen, con evidencia creíble, que la conformidad, cuando ocurre voluntaria y espontáneamente a diferencia cuando esta es inducida, involucra actividad del cerebro que se fundamenta en el aprendizaje guiado por reforzamiento, el control cognitivo y la atención

    Computational Theory of Mind for Human-Agent Coordination

    Get PDF
    In everyday life, people often depend on their theory of mind, i.e., their ability to reason about unobservable mental content of others to understand, explain, and predict their behaviour. Many agent-based models have been designed to develop computational theory of mind and analyze its effectiveness in various tasks and settings. However, most existing models are not generic (e.g., only applied in a given setting), not feasible (e.g., require too much information to be processed), or not human-inspired (e.g., do not capture the behavioral heuristics of humans). This hinders their applicability in many settings. Accordingly, we propose a new computational theory of mind, which captures the human decision heuristics of reasoning by abstracting individual beliefs about others. We specifically study computational affinity and show how it can be used in tandem with theory of mind reasoning when designing agent models for human-agent negotiation. We perform two-agent simulations to analyze the role of affinity in getting to agreements when there is a bound on the time to be spent for negotiating. Our results suggest that modeling affinity can ease the negotiation process by decreasing the number of rounds needed for an agreement as well as yield a higher benefit for agents with theory of mind reasoning.</p

    The EU's 'transnational power over' Central Asia:Developing and applying a structurally integrative approach to the study of the EU's power over Central Asia

    Get PDF
    This thesis challenges the consensual scholarly expectation of low EU impact in Central Asia. In particular, it claims that by focusing predominantly on narrow, micro-level factors, the prevailing theoretical perspectives risk overlooking less obvious aspects of the EU?s power, including structural aspects, and thus tend to underestimate the EU?s leverage in the region. Therefore, the thesis argues that a more structurally integrative and holistic approach is needed to understand the EU?s power in the region. In responding to this need, the thesis introduces a conceptual tool, which it terms „transnational power over? (TNPO). Inspired by debates in IPE, in particular new realist and critical IPE perspectives, and combining these views with insights from neorealist, neo-institutionalist and constructivist approaches to EU external relations, the concept of TNPO is an analytically eclectic notion, which helps to assess the degree to which, in today?s globalised and interdependent world, the EU?s power over third countries derives from its control over a combination of material, institutional and ideational structures, making it difficult for the EU?s partners to resist the EU?s initiatives or to reject its offers. In order to trace and assess the mechanisms of EU impact across these three structures, the thesis constructs a toolbox, which centres on four analytical distinctions: (i) EU-driven versus domestically driven mechanisms, (ii) mechanisms based on rationalist logics of action versus mechanisms following constructivist logics of action, (iii) agent-based versus purely structural mechanisms of TNPO, and (iv) transnational and intergovernmental mechanisms of EU impact. Using qualitative research methodology, the thesis then applies the conceptual model to the case of EU-Central Asia. It finds that the EU?s power over Central Asia effectively derives from its control over a combination of material, institutional and ideational structures, including its position as a leader in trade and investment in the region, its (geo)strategic and security-related capabilities vis-à-vis Central Asia, as well as the relatively dense level of institutionalisation of its relations with the five countries and the positive image of the EU in Central Asia as a more neutral actor

    Resolving social dilemmas with minimal reward transfer

    Full text link
    Multi-agent cooperation is an important topic, and is particularly challenging in mixed-motive situations where it does not pay to be nice to others. Consequently, self-interested agents often avoid collective behaviour, resulting in suboptimal outcomes for the group. In response, in this paper we introduce a metric to quantify the disparity between what is rational for individual agents and what is rational for the group, which we call the general self-interest level. This metric represents the maximum proportion of individual rewards that all agents can retain while ensuring that achieving social welfare optimum becomes a dominant strategy. By aligning the individual and group incentives, rational agents acting to maximise their own reward will simultaneously maximise the collective reward. As agents transfer their rewards to motivate others to consider their welfare, we diverge from traditional concepts of altruism or prosocial behaviours. The general self-interest level is a property of a game that is useful for assessing the propensity of players to cooperate and understanding how features of a game impact this. We illustrate the effectiveness of our method on several novel games representations of social dilemmas with arbitrary numbers of players.Comment: 34 pages, 13 tables, submitted to the Journal of Autonomous Agents and Multi-Agent Systems: Special Issue on Citizen-Centric AI System

    Modeling Moral Choices in Social Dilemmas with Multi-Agent Reinforcement Learning

    Get PDF
    Practical uses of Artificial Intelligence (AI) in the real world have demonstrated the importance of embedding moral choices into intelligent agents. They have also highlighted that defining top-down ethical constraints on AI according to any one type of morality is extremely challenging and can pose risks. A bottom-up learning approach may be more appropriate for studying and developing ethical behavior in AI agents. In particular, we believe that an interesting and insightful starting point is the analysis of emergent behavior of Reinforcement Learning (RL) agents that act according to a predefined set of moral rewards in social dilemmas. In this work, we present a systematic analysis of the choices made by intrinsically-motivated RL agents whose rewards are based on moral theories. We aim to design reward structures that are simplified yet representative of a set of key ethical systems. Therefore, we first define moral reward functions that distinguish between consequence- and norm-based agents, between morality based on societal norms or internal virtues, and between single- and mixed-virtue (e.g., multi-objective) methodologies. Then, we evaluate our approach by modeling repeated dyadic interactions between learning moral agents in three iterated social dilemma games (Prisoner's Dilemma, Volunteer's Dilemma and Stag Hunt). We analyze the impact of different types of morality on the emergence of cooperation, defection or exploitation, and the corresponding social outcomes. Finally, we discuss the implications of these findings for the development of moral agents in artificial and mixed human-AI societies.Comment: 7 pages, currently under review for a conferenc
    • …
    corecore