3,127 research outputs found
Neural Auto-Curricula in Two-Player Zero-Sum Games
When solving two-player zero-sum games, multi-agent reinforcement learning (MARL) algorithms often create populations of agents where, at each iteration, a new agent is discovered as the best response to a mixture over the opponent population. Within such a process, the update rules of "who to compete with" (i.e., the opponent mixture) and "how to beat them" (i.e., finding best responses) are underpinned by manually developed game theoretical principles such as fictitious play and Double Oracle. In this paper1, we introduce a novel framework-Neural Auto-Curricula (NAC)-that leverages meta-gradient descent to automate the discovery of the learning update rule without explicit human design. Specifically, we parameterise the opponent selection module by neural networks and the best-response module by optimisation subroutines, and update their parameters solely via interaction with the game engine, where both players aim to minimise their exploitability. Surprisingly, even without human design, the discovered MARL algorithms achieve competitive or even better performance with the state-of-the-art population-based game solvers (e.g., PSRO) on Games of Skill, differentiable Lotto, non-transitive Mixture Games, Iterated Matching Pennies, and Kuhn Poker. Additionally, we show that NAC is able to generalise from small games to large games, for example training on Kuhn Poker and outperforming PSRO on Leduc Poker. Our work inspires a promising future direction to discover general MARL algorithms solely from data
Deep Reinforcement Learning on a Budget: 3D Control and Reasoning Without a Supercomputer
An important goal of research in Deep Reinforcement Learning in mobile
robotics is to train agents capable of solving complex tasks, which require a
high level of scene understanding and reasoning from an egocentric perspective.
When trained from simulations, optimal environments should satisfy a currently
unobtainable combination of high-fidelity photographic observations, massive
amounts of different environment configurations and fast simulation speeds. In
this paper we argue that research on training agents capable of complex
reasoning can be simplified by decoupling from the requirement of high fidelity
photographic observations. We present a suite of tasks requiring complex
reasoning and exploration in continuous, partially observable 3D environments.
The objective is to provide challenging scenarios and a robust baseline agent
architecture that can be trained on mid-range consumer hardware in under 24h.
Our scenarios combine two key advantages: (i) they are based on a simple but
highly efficient 3D environment (ViZDoom) which allows high speed simulation
(12000fps); (ii) the scenarios provide the user with a range of difficulty
settings, in order to identify the limitations of current state of the art
algorithms and network architectures. We aim to increase accessibility to the
field of Deep-RL by providing baselines for challenging scenarios where new
ideas can be iterated on quickly. We argue that the community should be able to
address challenging problems in reasoning of mobile agents without the need for
a large compute infrastructure
- …