161 research outputs found
Theoretical advantages of lenient learners : an evolutionary game theoretic perspective
This paper presents the dynamics of multiple learning agents from an evolutionary game theoretic perspective. We provide replicator dynamics models for cooperative coevolutionary algorithms and for traditional multiagent Q-learning, and we extend these differential equations to account for lenient learners: agents that forgive possible mismatched teammate actions that resulted in low rewards. We use these extended formal models to study the convergence guarantees for these algorithms, and also to visualize the basins of attraction to optimal and suboptimal solutions in two benchmark coordination problems. The paper demonstrates that lenience provides learners with more accurate information about the benefits of performing their actions, resulting in higher likelihood of convergence to the globally optimal solution. In addition, the analysis indicates that the choice of learning algorithm has an insignificant impact on the overall performance of multiagent learning algorithms; rather, the performance of these algorithms depends primarily on the level of lenience that the agents exhibit to one another. Finally, the research herein supports the strength and generality of evolutionary game theory as a backbone for multiagent learning
Self-coordination of parameter conflicts in D-SON architectures: a Markov decision process framework
A visual demonstration of convergence properties of cooperative coevolution
We introduce a model for cooperative coevolutionary algorithms (CCEAs) using partial mixing, which allows us to compute the expected long-run convergence of such algorithms when individuals ’ fitness is based on the maximum payoff of some N evaluations with partners chosen at random from the other population. Using this model, we devise novel visualization mechanisms to attempt to qualitatively explain a difficult-to-conceptualize pathology in CCEAs: the tendency for them to converge to suboptimal Nash equilibria. We further demonstrate visually how increasing the size of N, or biasing the fitness to include an ideal-collaboration factor, both improve the likelihood of optimal convergence, and under which initial population configurations they are not much help
Rational bidding using reinforcement learning: an application in automated resource allocation
The application of autonomous agents by the provisioning and usage of computational resources is an attractive research field. Various methods and technologies in the area of artificial intelligence, statistics and economics are playing together to achieve i) autonomic resource provisioning and usage of computational resources, to invent ii) competitive bidding strategies for widely used market mechanisms and to iii) incentivize consumers and providers to use such market-based systems.
The contributions of the paper are threefold. First, we present a framework for supporting consumers and providers in technical and economic preference elicitation and the generation of bids. Secondly, we introduce a consumer-side reinforcement learning bidding strategy which enables rational behavior by the generation and selection of bids. Thirdly, we evaluate and compare this bidding strategy against a truth-telling bidding strategy for two kinds of market mechanisms – one centralized and one decentralized
Q-Strategy: A Bidding Strategy for Market-Based Allocation of Grid Services
The application of autonomous agents by the provisioning and usage of computational services is an attractive research field. Various methods and technologies in the area of artificial intelligence, statistics and economics are playing together to achieve i) autonomic service provisioning and usage of Grid services, to invent ii) competitive bidding strategies for widely used market mechanisms and to iii) incentivize consumers and providers to use such market-based systems.
The contributions of the paper are threefold. First, we present a bidding agent framework for implementing artificial bidding agents, supporting consumers and providers in technical and economic preference elicitation as well as automated bid generation by the requesting and provisioning of Grid services. Secondly, we introduce a novel consumer-side bidding strategy, which enables a goal-oriented and strategic behavior by the generation and submission of consumer service requests and selection of provider offers. Thirdly, we evaluate and compare the Q-strategy, implemented within the presented framework, against the Truth-Telling bidding strategy in three mechanisms – a centralized CDA, a decentralized on-line machine scheduling and a FIFO-scheduling mechanisms
Continuous Strategy Replicator Dynamics for Multi--Agent Learning
The problem of multi-agent learning and adaptation has attracted a great deal
of attention in recent years. It has been suggested that the dynamics of multi
agent learning can be studied using replicator equations from population
biology. Most existing studies so far have been limited to discrete strategy
spaces with a small number of available actions. In many cases, however, the
choices available to agents are better characterized by continuous spectra.
This paper suggests a generalization of the replicator framework that allows to
study the adaptive dynamics of Q-learning agents with continuous strategy
spaces. Instead of probability vectors, agents strategies are now characterized
by probability measures over continuous variables. As a result, the ordinary
differential equations for the discrete case are replaced by a system of
coupled integral--differential replicator equations that describe the mutual
evolution of individual agent strategies. We derive a set of functional
equations describing the steady state of the replicator dynamics, examine their
solutions for several two-player games, and confirm our analytical results
using simulations.Comment: 12 pages, 15 figures, accepted for publication in JAAMA
Dynamic Partition of Collaborative Multiagent Based on Coordination Trees
In team Markov games research, it is difficult for an individual agent to calculate the reward of collaborative agents dynamically. We present a coordination tree structure whose nodes are agent subsets or an agent. Two kinds of weights of a tree are defined which describe the cost of an agent collaborating with an agent subset. We can calculate a collaborative agent subset and its minimal cost for collaboration using these coordination trees. Some experiments of a Markov game have been done by using this novel algorithm. The results of the experiments prove that this method outperforms related multi-agent reinforcement-learning methods based on alterable collaborative teams
Scale-free memory model for multiagent reinforcement learning. Mean field approximation and rock-paper-scissors dynamics
A continuous time model for multiagent systems governed by reinforcement
learning with scale-free memory is developed. The agents are assumed to act
independently of one another in optimizing their choice of possible actions via
trial-and-error search. To gain awareness about the action value the agents
accumulate in their memory the rewards obtained from taking a specific action
at each moment of time. The contribution of the rewards in the past to the
agent current perception of action value is described by an integral operator
with a power-law kernel. Finally a fractional differential equation governing
the system dynamics is obtained. The agents are considered to interact with one
another implicitly via the reward of one agent depending on the choice of the
other agents. The pairwise interaction model is adopted to describe this
effect. As a specific example of systems with non-transitive interactions, a
two agent and three agent systems of the rock-paper-scissors type are analyzed
in detail, including the stability analysis and numerical simulation.
Scale-free memory is demonstrated to cause complex dynamics of the systems at
hand. In particular, it is shown that there can be simultaneously two modes of
the system instability undergoing subcritical and supercritical bifurcation,
with the latter one exhibiting anomalous oscillations with the amplitude and
period growing with time. Besides, the instability onset via this supercritical
mode may be regarded as "altruism self-organization". For the three agent
system the instability dynamics is found to be rather irregular and can be
composed of alternate fragments of oscillations different in their properties.Comment: 17 pages, 7 figur
A Cordial Sync: Going Beyond Marginal Policies for Multi-Agent Embodied Tasks
Autonomous agents must learn to collaborate. It is not scalable to develop a
new centralized agent every time a task's difficulty outpaces a single agent's
abilities. While multi-agent collaboration research has flourished in
gridworld-like environments, relatively little work has considered visually
rich domains. Addressing this, we introduce the novel task FurnMove in which
agents work together to move a piece of furniture through a living room to a
goal. Unlike existing tasks, FurnMove requires agents to coordinate at every
timestep. We identify two challenges when training agents to complete FurnMove:
existing decentralized action sampling procedures do not permit expressive
joint action policies and, in tasks requiring close coordination, the number of
failed actions dominates successful actions. To confront these challenges we
introduce SYNC-policies (synchronize your actions coherently) and CORDIAL
(coordination loss). Using SYNC-policies and CORDIAL, our agents achieve a 58%
completion rate on FurnMove, an impressive absolute gain of 25 percentage
points over competitive decentralized baselines. Our dataset, code, and
pretrained models are available at https://unnat.github.io/cordial-sync .Comment: Accepted to ECCV 2020 (spotlight); Project page:
https://unnat.github.io/cordial-syn
- …