762 research outputs found
Leveraging World Model Disentanglement in Value-Based Multi-Agent Reinforcement Learning
In this paper, we propose a novel model-based multi-agent reinforcement
learning approach named Value Decomposition Framework with Disentangled World
Model to address the challenge of achieving a common goal of multiple agents
interacting in the same environment with reduced sample complexity. Due to
scalability and non-stationarity problems posed by multi-agent systems,
model-free methods rely on a considerable number of samples for training. In
contrast, we use a modularized world model, composed of action-conditioned,
action-free, and static branches, to unravel the environment dynamics and
produce imagined outcomes based on past experience, without sampling directly
from the real environment. We employ variational auto-encoders and variational
graph auto-encoders to learn the latent representations for the world model,
which is merged with a value-based framework to predict the joint action-value
function and optimize the overall training objective. We present experimental
results in Easy, Hard, and Super-Hard StarCraft II micro-management challenges
to demonstrate that our method achieves high sample efficiency and exhibits
superior performance in defeating the enemy armies compared to other baselines.Comment: 14 page
Recommended from our members
Towards Informed Exploration for Deep Reinforcement Learning
In this thesis, we discuss various techniques for improving exploration for deep reinforcement learning. We begin with a brief review of reinforcement learning (RL) and the fundamental v.s. exploitation trade-off. Then we review how deep RL has improved upon classical and summarize six categories of the latest exploration methods for deep RL, in the order increasing usage of prior information. We then explore representative works in three categories discuss their strengths and weaknesses. The first category, represented by Soft Q-learning, uses regularization to encourage exploration. The second category, represented by count-based via hashing, maps states to hash codes for counting and assigns higher exploration to less-encountered states. The third category utilizes hierarchy and is represented by modular architecture for RL agents to play StarCraft II. Finally, we conclude that exploration by prior knowledge is a promising research direction and suggest topics of potentially impact
- …