Search CORE

317 research outputs found

Learning with Opponent-Learning Awareness

Author: Abbeel Pieter
Al-Shedivat Maruan
Chen Richard Y.
Foerster Jakob N.
Mordatch Igor
Whiteson Shimon
Publication venue
Publication date: 15/07/2018
Field of study

Multi-agent settings are quickly gathering importance in machine learning. This includes a plethora of recent work on deep multi-agent reinforcement learning, but also can be extended to hierarchical RL, generative adversarial networks and decentralised optimisation. In all these settings the presence of multiple learning agents renders the training problem non-stationary and often leads to unstable training or undesired final results. We present Learning with Opponent-Learning Awareness (LOLA), a method in which each agent shapes the anticipated learning of the other agents in the environment. The LOLA learning rule includes a term that accounts for the impact of one agent's policy on the anticipated parameter update of the other agents. Results show that the encounter of two LOLA agents leads to the emergence of tit-for-tat and therefore cooperation in the iterated prisoners' dilemma, while independent learning does not. In this domain, LOLA also receives higher payouts compared to a naive learner, and is robust against exploitation by higher order gradient-based methods. Applied to repeated matching pennies, LOLA agents converge to the Nash equilibrium. In a round robin tournament we show that LOLA agents successfully shape the learning of a range of multi-agent learning algorithms from literature, resulting in the highest average returns on the IPD. We also show that the LOLA update rule can be efficiently calculated using an extension of the policy gradient estimator, making the method suitable for model-free RL. The method thus scales to large parameter and input spaces and nonlinear function approximators. We apply LOLA to a grid world task with an embedded social dilemma using recurrent policies and opponent modelling. By explicitly considering the learning of the other agent, LOLA agents learn to cooperate out of self-interest. The code is at github.com/alshedivat/lola

arXiv.org e-Print Archive

Oxford University Research Archive

Sustainable Cooperative Coevolution with a Multi-Armed Bandit

Author: De Rainville François-Michel
Gagné Christian
Laurendeau Denis
Schoenauer Marc
Sebag Michèle
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2013
Field of study

This paper proposes a self-adaptation mechanism to manage the resources allocated to the different species comprising a cooperative coevolutionary algorithm. The proposed approach relies on a dynamic extension to the well-known multi-armed bandit framework. At each iteration, the dynamic multi-armed bandit makes a decision on which species to evolve for a generation, using the history of progress made by the different species to guide the decisions. We show experimentally, on a benchmark and a real-world problem, that evolving the different populations at different paces allows not only to identify solutions more rapidly, but also improves the capacity of cooperative coevolution to solve more complex problems.Comment: Accepted at GECCO 201

arXiv.org e-Print Archive

HAL-CentraleSupelec

Crossref

INRIA a CCSD electronic archive server

HAL-Rennes 1

Balancing Selection Pressures, Multiple Objectives, and Neural Modularity to Coevolve Cooperative Agent Behavior

Author: Rollins Alex C.
Schrum Jacob
Publication venue
Publication date: 24/03/2017
Field of study

Previous research using evolutionary computation in Multi-Agent Systems indicates that assigning fitness based on team vs.\ individual behavior has a strong impact on the ability of evolved teams of artificial agents to exhibit teamwork in challenging tasks. However, such research only made use of single-objective evolution. In contrast, when a multiobjective evolutionary algorithm is used, populations can be subject to individual-level objectives, team-level objectives, or combinations of the two. This paper explores the performance of cooperatively coevolved teams of agents controlled by artificial neural networks subject to these types of objectives. Specifically, predator agents are evolved to capture scripted prey agents in a torus-shaped grid world. Because of the tension between individual and team behaviors, multiple modes of behavior can be useful, and thus the effect of modular neural networks is also explored. Results demonstrate that fitness rewarding individual behavior is superior to fitness rewarding team behavior, despite being applied to a cooperative task. However, the use of networks with multiple modules allows predators to discover intelligent behavior, regardless of which type of objectives are used

arXiv.org e-Print Archive

Crossref

What to bid and when to stop

Author: Baarslag Tim
Publication venue
Publication date: 18/09/2014
Field of study

Negotiation is an important activity in human society, and is studied by various disciplines, ranging from economics and game theory, to electronic commerce, social psychology, and artificial intelligence. Traditionally, negotiation is a necessary, but also time-consuming and expensive activity. Therefore, in the last decades there has been a large interest in the automation of negotiation, for example in the setting of e-commerce. This interest is fueled by the promise of automated agents eventually being able to negotiate on behalf of human negotiators.Every year, automated negotiation agents are improving in various ways, and there is now a large body of negotiation strategies available, all with their unique strengths and weaknesses. For example, some agents are able to predict the opponent's preferences very well, while others focus more on having a sophisticated bidding strategy. The problem however, is that there is little incremental improvement in agent design, as the agents are tested in varying negotiation settings, using a diverse set of performance measures. This makes it very difficult to meaningfully compare the agents, let alone their underlying techniques. As a result, we lack a reliable way to pinpoint the most effective components in a negotiating agent.There are two major advantages of distinguishing between the different components of a negotiating agent's strategy: first, it allows the study of the behavior and performance of the components in isolation. For example, it becomes possible to compare the preference learning component of all agents, and to identify the best among them. Second, we can proceed to mix and match different components to create new negotiation strategies., e.g.: replacing the preference learning technique of an agent and then examining whether this makes a difference. Such a procedure enables us to combine the individual components to systematically explore the space of possible negotiation strategies.To develop a compositional approach to evaluate and combine the components, we identify structure in most agent designs by introducing the BOA architecture, in which we can develop and integrate the different components of a negotiating agent. We identify three main components of a general negotiation strategy; namely a bidding strategy (B), possibly an opponent model (O), and an acceptance strategy (A). The bidding strategy considers what concessions it deems appropriate given its own preferences, and takes the opponent into account by using an opponent model. The acceptance strategy decides whether offers proposed by the opponent should be accepted.The BOA architecture is integrated into a generic negotiation environment called Genius, which is a software environment for designing and evaluating negotiation strategies. To explore the negotiation strategy space of the negotiation research community, we amend the Genius repository with various existing agents and scenarios from literature. Additionally, we organize a yearly international negotiation competition (ANAC) to harvest even more strategies and scenarios. ANAC also acts as an evaluation tool for negotiation strategies, and encourages the design of negotiation strategies and scenarios.We re-implement agents from literature and ANAC and decouple them to fit into the BOA architecture without introducing any changes in their behavior. For each of the three components, we manage to find and analyze the best ones for specific cases, as described below. We show that the BOA framework leads to significant improvements in agent design by wining ANAC 2013, which had 19 participating teams from 8 international institutions, with an agent that is designed using the BOA framework and is informed by a preliminary analysis of the different components.In every negotiation, one of the negotiating parties must accept an offer to reach an agreement. Therefore, it is important that a negotiator employs a proficient mechanism to decide under which conditions to accept. When contemplating whether to accept an offer, the agent is faced with the acceptance dilemma: accepting the offer may be suboptimal, as better offers may still be presented before time runs out. On the other hand, accepting too late may prevent an agreement from being reached, resulting in a break off with no gain for either party. We classify and compare state-of-the-art generic acceptance conditions. We propose new acceptance strategies and we demonstrate that they outperform the other conditions. We also provide insight into why some conditions work better than others and investigate correlations between the properties of the negotiation scenario and the efficacy of acceptance conditions.Later, we adopt a more principled approach by applying optimal stopping theory to calculate the optimal decision on the acceptance of an offer. We approach the decision of whether to accept as a sequential decision problem, by modeling the bids received as a stochastic process. We determine the optimal acceptance policies for particular opponent classes and we present an approach to estimate the expected range of offers when the type of opponent is unknown. We show that the proposed approach is able to find the optimal time to accept, and improves upon all existing acceptance strategies.Another principal component of a negotiating agent's strategy is its ability to take the opponent's preferences into account. The quality of an opponent model can be measured in two different ways. One is to use the agent's performance as a benchmark for the model's quality. We evaluate and compare the performance of a selection of state-of-the-art opponent modeling techniques in negotiation. We provide an overview of the factors influencing the quality of a model and we analyze how the performance of opponent models depends on the negotiation setting. We identify a class of simple and surprisingly effective opponent modeling techniques that did not receive much previous attention in literature.The other way to measure the quality of an opponent model is to directly evaluate its accuracy by using similarity measures. We review all methods to measure the accuracy of an opponent model and we then analyze how changes in accuracy translate into performance differences. Moreover, we pinpoint the best predictors for good performance. This leads to new insights concerning how to construct an opponent model, and what we need to measure when optimizing performance.Finally, we take two different approaches to gain more insight into effective bidding strategies. We present a new classification method for negotiation strategies, based on their pattern of concession making against different kinds of opponents. We apply this technique to classify some well-known negotiating strategies, and we formulate guidelines on how agents should bid in order to be successful, which gives insight into the bidding strategy space of negotiating agents. Furthermore, we apply optimal stopping theory again, this time to find the concessions that maximize utility for the bidder against particular opponents. We show there is an interesting connection between optimal bidding and optimal acceptance strategies, in the sense that they are mirrored versions of each other.Lastly, after analyzing all components separately, we put the pieces back together again. We take all BOA components accumulated so far, including the best ones, and combine them all together to explore the space of negotiation strategies.We compute the contribution of each component to the overall negotiation result, and we study the interaction between components. We find that combining the best agent components indeed makes the strongest agents. This shows that the component-based view of the BOA architecture not only provides a useful basis for developing negotiating agents but also provides a useful analytical tool. By varying the BOA components we are able to demonstrate the contribution of each component to the negotiation result, and thus analyze the significance of each. The bidding strategy is by far the most important to consider, followed by the acceptance conditions and finally followed by the opponent model.Our results validate the analytical approach of the BOA framework to first optimize the individual components, and then to recombine them into a negotiating agent

Southampton (e-Prints Soton)

TU Delft Repository

Addressing stability issues in mediated complex contract negotiations for constraint-based, non-monotonic utility spaces

Author: Ito Takayuki
Klein Mark
Lopez-Carmona Miguel A.
Marsa Maestre Ivan
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 18/08/2016
Field of study

Negotiating contracts with multiple interdependent issues may yield non- monotonic, highly uncorrelated preference spaces for the participating agents. These scenarios are specially challenging because the complexity of the agents’ utility functions makes traditional negotiation mechanisms not applicable. There is a number of recent research lines addressing complex negotiations in uncorrelated utility spaces. However, most of them focus on overcoming the problems imposed by the complexity of the scenario, without analyzing the potential consequences of the strategic behavior of the negotiating agents in the models they propose. Analyzing the dynamics of the negotiation process when agents with different strategies interact is necessary to apply these models to real, competitive environments. Specially problematic are high price of anarchy situations, which imply that individual rationality drives the agents towards strategies which yield low individual and social welfares. In scenarios involving highly uncorrelated utility spaces, “low social welfare” usually means that the negotiations fail, and therefore high price of anarchy situations should be avoided in the negotiation mechanisms. In our previous work, we proposed an auction-based negotiation model designed for negotiations about complex contracts when highly uncorrelated, constraint-based utility spaces are involved. This paper performs a strategy analysis of this model, revealing that the approach raises stability concerns, leading to situations with a high (or even infinite) price of anarchy. In addition, a set of techniques to solve this problem are proposed, and an experimental evaluation is performed to validate the adequacy of the proposed approaches to improve the strategic stability of the negotiation process. Finally, incentive-compatibility of the model is studied.Spain. Ministerio de Educación y Ciencia (grant TIN2008-06739-C04-04

DSpace@MIT

Novelty-driven cooperative coevolution

Author: Christensen A. L.
Gomes J.
Mariano P.
Publication venue: 'MIT Press - Journals'
Publication date: 01/01/2017
Field of study

Cooperative coevolutionary algorithms (CCEAs) rely on multiple coevolving populations for the evolution of solutions composed of coadapted components. CCEAs enable, for instance, the evolution of cooperative multiagent systems composed of heterogeneous agents, where each agent is modelled as a component of the solution. Previous works have, however, shown that CCEAs are biased toward stability: the evolutionary process tends to converge prematurely to stable states instead of (near-)optimal solutions. In this study, we show how novelty search can be used to avoid the counterproductive attraction to stable states in coevolution. Novelty search is an evolutionary technique that drives evolution toward behavioural novelty and diversity rather than exclusively pursuing a static objective. We evaluate three novelty-based approaches that rely on, respectively (1) the novelty of the team as a whole, (2) the novelty of the agents’ individual behaviour, and (3) the combination of the two. We compare the proposed approaches with traditional fitness-driven cooperative coevolution in three simulated multirobot tasks. Our results show that team-level novelty scoring is the most effective approach, significantly outperforming fitness-driven coevolution at multiple levels. Novelty-driven cooperative coevolution can substantially increase the potential of CCEAs while maintaining a computational complexity that scales well with the number of populations.info:eu-repo/semantics/publishedVersio

Repositório Institucional do ISCTE-IUL