3,082 research outputs found
Fact-based Agent modeling for Multi-Agent Reinforcement Learning
In multi-agent systems, agents need to interact and collaborate with other
agents in environments. Agent modeling is crucial to facilitate agent
interactions and make adaptive cooperation strategies. However, it is
challenging for agents to model the beliefs, behaviors, and intentions of other
agents in non-stationary environment where all agent policies are learned
simultaneously. In addition, the existing methods realize agent modeling
through behavior cloning which assume that the local information of other
agents can be accessed during execution or training. However, this assumption
is infeasible in unknown scenarios characterized by unknown agents, such as
competition teams, unreliable communication and federated learning due to
privacy concerns. To eliminate this assumption and achieve agent modeling in
unknown scenarios, Fact-based Agent modeling (FAM) method is proposed in which
fact-based belief inference (FBI) network models other agents in partially
observable environment only based on its local information. The reward and
observation obtained by agents after taking actions are called facts, and FAM
uses facts as reconstruction target to learn the policy representation of other
agents through a variational autoencoder. We evaluate FAM on various Multiagent
Particle Environment (MPE) and compare the results with several
state-of-the-art MARL algorithms. Experimental results show that compared with
baseline methods, FAM can effectively improve the efficiency of agent policy
learning by making adaptive cooperation strategies in multi-agent reinforcement
learning tasks, while achieving higher returns in complex
competitive-cooperative mixed scenarios
Hybrid Reinforcement Learning with Expert State Sequences
Existing imitation learning approaches often require that the complete
demonstration data, including sequences of actions and states, are available.
In this paper, we consider a more realistic and difficult scenario where a
reinforcement learning agent only has access to the state sequences of an
expert, while the expert actions are unobserved. We propose a novel
tensor-based model to infer the unobserved actions of the expert state
sequences. The policy of the agent is then optimized via a hybrid objective
combining reinforcement learning and imitation learning. We evaluated our
hybrid approach on an illustrative domain and Atari games. The empirical
results show that (1) the agents are able to leverage state expert sequences to
learn faster than pure reinforcement learning baselines, (2) our tensor-based
action inference model is advantageous compared to standard deep neural
networks in inferring expert actions, and (3) the hybrid policy optimization
objective is robust against noise in expert state sequences.Comment: AAAI 2019; https://github.com/XiaoxiaoGuo/tensor4r
Stick-Breaking Policy Learning in Dec-POMDPs
Expectation maximization (EM) has recently been shown to be an efficient
algorithm for learning finite-state controllers (FSCs) in large decentralized
POMDPs (Dec-POMDPs). However, current methods use fixed-size FSCs and often
converge to maxima that are far from optimal. This paper considers a
variable-size FSC to represent the local policy of each agent. These
variable-size FSCs are constructed using a stick-breaking prior, leading to a
new framework called \emph{decentralized stick-breaking policy representation}
(Dec-SBPR). This approach learns the controller parameters with a variational
Bayesian algorithm without having to assume that the Dec-POMDP model is
available. The performance of Dec-SBPR is demonstrated on several benchmark
problems, showing that the algorithm scales to large problems while
outperforming other state-of-the-art methods
Coordinated Multi-Agent Imitation Learning
We study the problem of imitation learning from demonstrations of multiple
coordinating agents. One key challenge in this setting is that learning a good
model of coordination can be difficult, since coordination is often implicit in
the demonstrations and must be inferred as a latent variable. We propose a
joint approach that simultaneously learns a latent coordination model along
with the individual policies. In particular, our method integrates unsupervised
structure learning with conventional imitation learning. We illustrate the
power of our approach on a difficult problem of learning multiple policies for
fine-grained behavior modeling in team sports, where different players occupy
different roles in the coordinated team strategy. We show that having a
coordination model to infer the roles of players yields substantially improved
imitation loss compared to conventional baselines.Comment: International Conference on Machine Learning 201
- …