3,044 research outputs found
Scalable Planning and Learning for Multiagent POMDPs: Extended Version
Online, sample-based planning algorithms for POMDPs have shown great promise
in scaling to problems with large state spaces, but they become intractable for
large action and observation spaces. This is particularly problematic in
multiagent POMDPs where the action and observation space grows exponentially
with the number of agents. To combat this intractability, we propose a novel
scalable approach based on sample-based planning and factored value functions
that exploits structure present in many multiagent settings. This approach
applies not only in the planning case, but also in the Bayesian reinforcement
learning setting. Experimental results show that we are able to provide high
quality solutions to large multiagent planning and learning problems
Partner Selection for the Emergence of Cooperation in Multi-Agent Systems Using Reinforcement Learning
Social dilemmas have been widely studied to explain how humans are able to
cooperate in society. Considerable effort has been invested in designing
artificial agents for social dilemmas that incorporate explicit agent
motivations that are chosen to favor coordinated or cooperative responses. The
prevalence of this general approach points towards the importance of achieving
an understanding of both an agent's internal design and external environment
dynamics that facilitate cooperative behavior. In this paper, we investigate
how partner selection can promote cooperative behavior between agents who are
trained to maximize a purely selfish objective function. Our experiments reveal
that agents trained with this dynamic learn a strategy that retaliates against
defectors while promoting cooperation with other agents resulting in a
prosocial society.Comment:
Sequential Decision Making with Untrustworthy Service Providers
In this paper, we deal with the sequential decision making problem of agents operating in computational economies, where there is uncertainty regarding the trustworthiness of service providers populating the environment. Specifically, we propose a generic Bayesian trust model, and formulate the optimal Bayesian solution to the exploration-exploitation problem facing the agents when repeatedly interacting with others in such environments. We then present a computationally tractable Bayesian reinforcement learning algorithm to approximate that solution by taking into account the expected value of perfect information of an agent's actions. Our algorithm is shown to dramatically outperform all previous finalists of the international Agent Reputation and Trust (ART) competition, including the winner from both years the competition has been run
A Study of AI Population Dynamics with Million-agent Reinforcement Learning
We conduct an empirical study on discovering the ordered collective dynamics
obtained by a population of intelligence agents, driven by million-agent
reinforcement learning. Our intention is to put intelligent agents into a
simulated natural context and verify if the principles developed in the real
world could also be used in understanding an artificially-created intelligent
population. To achieve this, we simulate a large-scale predator-prey world,
where the laws of the world are designed by only the findings or logical
equivalence that have been discovered in nature. We endow the agents with the
intelligence based on deep reinforcement learning (DRL). In order to scale the
population size up to millions agents, a large-scale DRL training platform with
redesigned experience buffer is proposed. Our results show that the population
dynamics of AI agents, driven only by each agent's individual self-interest,
reveals an ordered pattern that is similar to the Lotka-Volterra model studied
in population biology. We further discover the emergent behaviors of collective
adaptations in studying how the agents' grouping behaviors will change with the
environmental resources. Both of the two findings could be explained by the
self-organization theory in nature.Comment: Full version of the paper presented at AAMAS 2018 (International
Conference on Autonomous Agents and Multiagent Systems
- ā¦