1,184 research outputs found
Multi-Horizon Representations with Hierarchical Forward Models for Reinforcement Learning
Learning control from pixels is difficult for reinforcement learning (RL)
agents because representation learning and policy learning are intertwined.
Previous approaches remedy this issue with auxiliary representation learning
tasks, but they either do not consider the temporal aspect of the problem or
only consider single-step transitions, which may cause learning inefficiencies
if important environmental changes take many steps to manifest. We propose
Hierarchical -Step Latent (HKSL), an auxiliary task that learns multiple
representations via a hierarchy of forward models that learn to communicate and
an ensemble of -step critics that all operate at varying magnitudes of step
skipping. We evaluate HKSL in a suite of 30 robotic control tasks with and
without distractors and a task of our creation. We find that HKSL either
converges to higher or optimal episodic returns more quickly than several
alternative representation learning approaches. Furthermore, we find that
HKSL's representations capture task-relevant details accurately across
timescales (even in the presence of distractors) and that communication
channels between hierarchy levels organize information based on both sides of
the communication process, both of which improve sample efficiency.Comment: Published in TML
Shared Experience Actor-Critic for Multi-Agent Reinforcement Learning
Exploration in multi-agent reinforcement learning is a challenging problem,
especially in environments with sparse rewards. We propose a general method for
efficient exploration by sharing experience amongst agents. Our proposed
algorithm, called Shared Experience Actor-Critic (SEAC), applies experience
sharing in an actor-critic framework. We evaluate SEAC in a collection of
sparse-reward multi-agent environments and find that it consistently
outperforms two baselines and two state-of-the-art algorithms by learning in
fewer steps and converging to higher returns. In some harder environments,
experience sharing makes the difference between learning to solve the task and
not learning at all.Comment: 34th Conference on Neural Information Processing Systems (NeurIPS
2020), Vancouver, Canad
Pareto Actor-Critic for Equilibrium Selection in Multi-Agent Reinforcement Learning
This work focuses on equilibrium selection in no-conflict multi-agent games,
where we specifically study the problem of selecting a Pareto-optimal
equilibrium among several existing equilibria. It has been shown that many
state-of-the-art multi-agent reinforcement learning (MARL) algorithms are prone
to converging to Pareto-dominated equilibria due to the uncertainty each agent
has about the policy of the other agents during training. To address
sub-optimal equilibrium selection, we propose Pareto Actor-Critic (Pareto-AC),
which is an actor-critic algorithm that utilises a simple property of
no-conflict games (a superset of cooperative games): the Pareto-optimal
equilibrium in a no-conflict game maximises the returns of all agents and
therefore is the preferred outcome for all agents. We evaluate Pareto-AC in a
diverse set of multi-agent games and show that it converges to higher episodic
returns compared to seven state-of-the-art MARL algorithms and that it
successfully converges to a Pareto-optimal equilibrium in a range of matrix
games. Finally, we propose PACDCG, a graph neural network extension of
Pareto-AC which is shown to efficiently scale in games with a large number of
agents.Comment: 20 pages, 12 figure
- …