98,357 research outputs found
MADiff: Offline Multi-agent Learning with Diffusion Models
Diffusion model (DM), as a powerful generative model, recently achieved huge
success in various scenarios including offline reinforcement learning, where
the policy learns to conduct planning by generating trajectory in the online
evaluation. However, despite the effectiveness shown for single-agent learning,
it remains unclear how DMs can operate in multi-agent problems, where agents
can hardly complete teamwork without good coordination by independently
modeling each agent's trajectories. In this paper, we propose MADiff, a novel
generative multi-agent learning framework to tackle this problem. MADiff is
realized with an attention-based diffusion model to model the complex
coordination among behaviors of multiple diffusion agents. To the best of our
knowledge, MADiff is the first diffusion-based multi-agent offline RL
framework, which behaves as both a decentralized policy and a centralized
controller, which includes opponent modeling and can be used for multi-agent
trajectory prediction. MADiff takes advantage of the powerful generative
ability of diffusion while well-suited in modeling complex multi-agent
interactions. Our experiments show the superior performance of MADiff compared
to baseline algorithms in a range of multi-agent learning tasks.Comment: 17 pages, 7 figures, 4 table
Motion Synthesis and Control for Autonomous Agents using Generative Models and Reinforcement Learning
Imitating and predicting human motions have wide applications in both graphics and robotics, from developing realistic models of human movement and behavior in immersive virtual worlds and games to improving autonomous navigation for service agents deployed in the real world. Traditional approaches for motion imitation and prediction typically rely on pre-defined rules to model agent behaviors or use reinforcement learning with manually designed reward functions. Despite impressive results, such approaches cannot effectively capture the diversity of motor behaviors and the decision making capabilities of human beings. Furthermore, manually designing a model or reward function to explicitly describe human motion characteristics often involves laborious fine-tuning and repeated experiments, and may suffer from generalization issues. In this thesis, we explore data-driven approaches using generative models and reinforcement learning to study and simulate human motions. Specifically, we begin with motion synthesis and control of physically simulated agents imitating a wide range of human motor skills, and then focus on improving the local navigation decisions of autonomous agents in multi-agent interaction settings. For physics-based agent control, we introduce an imitation learning framework built upon generative adversarial networks and reinforcement learning that enables humanoid agents to learn motor skills from a few examples of human reference motion data. Our approach generates high-fidelity motions and robust controllers without needing to manually design and finetune a reward function, allowing at the same time interactive switching between different controllers based on user input. Based on this framework, we further propose a multi-objective learning scheme for composite and task-driven control of humanoid agents. Our multi-objective learning scheme balances the simultaneous learning of disparate motions from multiple reference sources and multiple goal-directed control objectives in an adaptive way, enabling the training of efficient composite motion controllers. Additionally, we present a general framework for fast and robust learning of motor control skills. Our framework exploits particle filtering to dynamically explore and discretize the high-dimensional action space involved in continuous control tasks, and provides a multi-modal policy as a substitute for the commonly used Gaussian policies. For navigation learning, we leverage human crowd data to train a human-inspired collision avoidance policy by combining knowledge distillation and reinforcement learning. Our approach enables autonomous agents to take human-like actions during goal-directed steering in fully decentralized, multi-agent environments. To inform better control in such environments, we propose SocialVAE, a variational autoencoder based architecture that uses timewise latent variables with socially-aware conditions and a backward posterior approximation to perform agent trajectory prediction. Our approach improves current state-of-the-art performance on trajectory prediction tasks in daily human interaction scenarios and more complex scenes involving interactions between NBA players. We further extend SocialVAE by exploiting semantic maps as context conditions to generate map-compliant trajectory prediction. Our approach processes context conditions and social conditions occurring during agent-agent interactions in an integrated manner through the use of a dual-attention mechanism. We demonstrate the real-time performance of our approach and its ability to provide high-fidelity, multi-modal predictions on various large-scale vehicle trajectory prediction tasks
Adaptive Digital Twin for UAV-Assisted Integrated Sensing, Communication, and Computation Networks
In this paper, we study a digital twin (DT)-empowered integrated sensing,
communication, and computation network. Specifically, the users perform radar
sensing and computation offloading on the same spectrum, while unmanned aerial
vehicles (UAVs) are deployed to provide edge computing service. We first
formulate a multi-objective optimization problem to minimize the beampattern
performance of multi-input multi-output (MIMO) radars and the computation
offloading energy consumption simultaneously. Then, we explore the prediction
capability of DT to provide intelligent offloading decision, where the DT
estimation deviation is considered. To track this challenge, we reformulate the
original problem as a multi-agent Markov decision process and design a
multi-agent proximal policy optimization (MAPPO) framework to achieve a
flexible learning policy. Furthermore, the Beta-policy and attention mechanism
are used to improve the training performance. Numerical results show that the
proposed method is able to balance the performance tradeoff between sensing and
computation functions, while reducing the energy consumption compared with the
existing studies.Comment: 14 pages, 11 figures
A Hierarchical Hybrid Learning Framework for Multi-agent Trajectory Prediction
Accurate and robust trajectory prediction of neighboring agents is critical
for autonomous vehicles traversing in complex scenes. Most methods proposed in
recent years are deep learning-based due to their strength in encoding complex
interactions. However, unplausible predictions are often generated since they
rely heavily on past observations and cannot effectively capture the transient
and contingency interactions from sparse samples. In this paper, we propose a
hierarchical hybrid framework of deep learning (DL) and reinforcement learning
(RL) for multi-agent trajectory prediction, to cope with the challenge of
predicting motions shaped by multi-scale interactions. In the DL stage, the
traffic scene is divided into multiple intermediate-scale heterogenous graphs
based on which Transformer-style GNNs are adopted to encode heterogenous
interactions at intermediate and global levels. In the RL stage, we divide the
traffic scene into local sub-scenes utilizing the key future points predicted
in the DL stage. To emulate the motion planning procedure so as to produce
trajectory predictions, a Transformer-based Proximal Policy Optimization (PPO)
incorporated with a vehicle kinematics model is devised to plan motions under
the dominant influence of microscopic interactions. A multi-objective reward is
designed to balance between agent-centric accuracy and scene-wise
compatibility. Experimental results show that our proposal matches the
state-of-the-arts on the Argoverse forecasting benchmark. It's also revealed by
the visualized results that the hierarchical learning framework captures the
multi-scale interactions and improves the feasibility and compliance of the
predicted trajectories
TrafficBots: Towards World Models for Autonomous Driving Simulation and Motion Prediction
Data-driven simulation has become a favorable way to train and test
autonomous driving algorithms. The idea of replacing the actual environment
with a learned simulator has also been explored in model-based reinforcement
learning in the context of world models. In this work, we show data-driven
traffic simulation can be formulated as a world model. We present TrafficBots,
a multi-agent policy built upon motion prediction and end-to-end driving, and
based on TrafficBots we obtain a world model tailored for the planning module
of autonomous vehicles. Existing data-driven traffic simulators are lacking
configurability and scalability. To generate configurable behaviors, for each
agent we introduce a destination as navigational information, and a
time-invariant latent personality that specifies the behavioral style. To
improve the scalability, we present a new scheme of positional encoding for
angles, allowing all agents to share the same vectorized context and the use of
an architecture based on dot-product attention. As a result, we can simulate
all traffic participants seen in dense urban scenarios. Experiments on the
Waymo open motion dataset show TrafficBots can simulate realistic multi-agent
behaviors and achieve good performance on the motion prediction task.Comment: Published at ICRA 2023. The repository is available at
https://github.com/zhejz/TrafficBot
Terminal Prediction as an Auxiliary Task for Deep Reinforcement Learning
Deep reinforcement learning has achieved great successes in recent years, but
there are still open challenges, such as convergence to locally optimal
policies and sample inefficiency. In this paper, we contribute a novel
self-supervised auxiliary task, i.e., Terminal Prediction (TP), estimating
temporal closeness to terminal states for episodic tasks. The intuition is to
help representation learning by letting the agent predict how close it is to a
terminal state, while learning its control policy. Although TP could be
integrated with multiple algorithms, this paper focuses on Asynchronous
Advantage Actor-Critic (A3C) and demonstrating the advantages of A3C-TP. Our
extensive evaluation includes: a set of Atari games, the BipedalWalker domain,
and a mini version of the recently proposed multi-agent Pommerman game. Our
results on Atari games and the BipedalWalker domain suggest that A3C-TP
outperforms standard A3C in most of the tested domains and in others it has
similar performance. In Pommerman, our proposed method provides significant
improvement both in learning efficiency and converging to better policies
against different opponents.Comment: AAAI Conference on Artificial Intelligence and Interactive Digital
Entertainment (AIIDE'19). arXiv admin note: text overlap with
arXiv:1812.0004
Coordinated Multi-Agent Imitation Learning
We study the problem of imitation learning from demonstrations of multiple
coordinating agents. One key challenge in this setting is that learning a good
model of coordination can be difficult, since coordination is often implicit in
the demonstrations and must be inferred as a latent variable. We propose a
joint approach that simultaneously learns a latent coordination model along
with the individual policies. In particular, our method integrates unsupervised
structure learning with conventional imitation learning. We illustrate the
power of our approach on a difficult problem of learning multiple policies for
fine-grained behavior modeling in team sports, where different players occupy
different roles in the coordinated team strategy. We show that having a
coordination model to infer the roles of players yields substantially improved
imitation loss compared to conventional baselines.Comment: International Conference on Machine Learning 201
- …