826 research outputs found
Implicit imitation in multiagent reinforcement learning
Imitation is actively being studied as an effective means of learning in multi-agent environments. It allows an agent to learn how to act well (perhaps optimally) by passively observing the actions of cooperative teachers or other more experienced agents its environment. We propose a straightforward imitation mechanism called model extraction that can be integrated easily into standard model-based reinforcement learning algorithms. Roughly, by observing a mentor with similar capabilities, an agent can extract information about its own capabilities in unvisited parts of state space. The extracted information can accelerate learning dramatically. We illustrate the benefits of model extraction by integrating it with prioritized sweeping, and demonstrating improved performance and convergence through observation of single and multiple mentors. Though we make some stringent assumptions regarding observability, possible interactions and common abilities, we briefly comment on extensions of the model that relax these
Coordinated Multi-Agent Imitation Learning
We study the problem of imitation learning from demonstrations of multiple
coordinating agents. One key challenge in this setting is that learning a good
model of coordination can be difficult, since coordination is often implicit in
the demonstrations and must be inferred as a latent variable. We propose a
joint approach that simultaneously learns a latent coordination model along
with the individual policies. In particular, our method integrates unsupervised
structure learning with conventional imitation learning. We illustrate the
power of our approach on a difficult problem of learning multiple policies for
fine-grained behavior modeling in team sports, where different players occupy
different roles in the coordinated team strategy. We show that having a
coordination model to infer the roles of players yields substantially improved
imitation loss compared to conventional baselines.Comment: International Conference on Machine Learning 201
Learning Realistic Traffic Agents in Closed-loop
Realistic traffic simulation is crucial for developing self-driving software
in a safe and scalable manner prior to real-world deployment. Typically,
imitation learning (IL) is used to learn human-like traffic agents directly
from real-world observations collected offline, but without explicit
specification of traffic rules, agents trained from IL alone frequently display
unrealistic infractions like collisions and driving off the road. This problem
is exacerbated in out-of-distribution and long-tail scenarios. On the other
hand, reinforcement learning (RL) can train traffic agents to avoid
infractions, but using RL alone results in unhuman-like driving behaviors. We
propose Reinforcing Traffic Rules (RTR), a holistic closed-loop learning
objective to match expert demonstrations under a traffic compliance constraint,
which naturally gives rise to a joint IL + RL approach, obtaining the best of
both worlds. Our method learns in closed-loop simulations of both nominal
scenarios from real-world datasets as well as procedurally generated long-tail
scenarios. Our experiments show that RTR learns more realistic and
generalizable traffic simulation policies, achieving significantly better
tradeoffs between human-like driving and traffic compliance in both nominal and
long-tail scenarios. Moreover, when used as a data generation tool for training
prediction models, our learned traffic policy leads to considerably improved
downstream prediction metrics compared to baseline traffic agents. For more
information, visit the project website: https://waabi.ai/rtrComment: CORL 202
- …