582 research outputs found
SMIX(): Enhancing Centralized Value Functions for Cooperative Multi-Agent Reinforcement Learning
Learning a stable and generalizable centralized value function (CVF) is a
crucial but challenging task in multi-agent reinforcement learning (MARL), as
it has to deal with the issue that the joint action space increases
exponentially with the number of agents in such scenarios. This paper proposes
an approach, named SMIX(), to address the issue using an efficient
off-policy centralized training method within a flexible learner search space.
As importance sampling for such off-policy training is both computationally
costly and numerically unstable, we proposed to use the -return as a
proxy to compute the TD error. With this new loss function objective, we adopt
a modified QMIX network structure as the base to train our model. By further
connecting it with the approach from an unified expectation
correction viewpoint, we show that the proposed SMIX() is equivalent
to and hence shares its convergence properties, while without
being suffered from the aforementioned curse of dimensionality problem inherent
in MARL. Experiments on the StarCraft Multi-Agent Challenge (SMAC) benchmark
demonstrate that our approach not only outperforms several state-of-the-art
MARL methods by a large margin, but also can be used as a general tool to
improve the overall performance of other CTDE-type algorithms by enhancing
their CVFs
The 1990 progress report and future plans
This document describes the progress and plans of the Artificial Intelligence Research Branch (RIA) at ARC in 1990. Activities span a range from basic scientific research to engineering development and to fielded NASA applications, particularly those applications that are enabled by basic research carried out at RIA. Work is conducted in-house and through collaborative partners in academia and industry. Our major focus is on a limited number of research themes with a dual commitment to technical excellence and proven applicability to NASA short, medium, and long-term problems. RIA acts as the Agency's lead organization for research aspects of artificial intelligence, working closely with a second research laboratory at JPL and AI applications groups at all NASA centers
Developing, Evaluating and Scaling Learning Agents in Multi-Agent Environments
The Game Theory & Multi-Agent team at DeepMind studies several aspects of
multi-agent learning ranging from computing approximations to fundamental
concepts in game theory to simulating social dilemmas in rich spatial
environments and training 3-d humanoids in difficult team coordination tasks. A
signature aim of our group is to use the resources and expertise made available
to us at DeepMind in deep reinforcement learning to explore multi-agent systems
in complex environments and use these benchmarks to advance our understanding.
Here, we summarise the recent work of our team and present a taxonomy that we
feel highlights many important open challenges in multi-agent research.Comment: Published in AI Communications 202
On Transforming Reinforcement Learning by Transformer: The Development Trajectory
Transformer, originally devised for natural language processing, has also
attested significant success in computer vision. Thanks to its super expressive
power, researchers are investigating ways to deploy transformers to
reinforcement learning (RL) and the transformer-based models have manifested
their potential in representative RL benchmarks. In this paper, we collect and
dissect recent advances on transforming RL by transformer (transformer-based RL
or TRL), in order to explore its development trajectory and future trend. We
group existing developments in two categories: architecture enhancement and
trajectory optimization, and examine the main applications of TRL in robotic
manipulation, text-based games, navigation and autonomous driving. For
architecture enhancement, these methods consider how to apply the powerful
transformer structure to RL problems under the traditional RL framework, which
model agents and environments much more precisely than deep RL methods, but
they are still limited by the inherent defects of traditional RL algorithms,
such as bootstrapping and "deadly triad". For trajectory optimization, these
methods treat RL problems as sequence modeling and train a joint state-action
model over entire trajectories under the behavior cloning framework, which are
able to extract policies from static datasets and fully use the long-sequence
modeling capability of the transformer. Given these advancements, extensions
and challenges in TRL are reviewed and proposals about future direction are
discussed. We hope that this survey can provide a detailed introduction to TRL
and motivate future research in this rapidly developing field.Comment: 26 page
Life Long Learning In Sparse Learning Environments
Life long learning is a machine learning technique that deals with learning sequential tasks over time. It seeks to transfer knowledge from previous learning tasks to new learning tasks in order to increase generalization performance and learning speed. Real-time learning environments in which many agents are participating may provide learning opportunities but they are spread out in time and space outside of the geographical scope of a single learning agent. This research seeks to provide an algorithm and framework for life long learning among a network of agents in a sparse real-time learning environment. This work will utilize the robust knowledge representation of neural networks, and make use of both functional and representational knowledge transfer to accomplish this task. A new generative life long learning algorithm utilizing cascade correlation and reverberating pseudo-rehearsal and incorporating a method for merging divergent life long learning paths will be implemented
Large network multi-level control for CAV and Smart Infrastructure: AI-based Fog-Cloud collaboration
On the Combination of Game-Theoretic Learning and Multi Model Adaptive Filters
This paper casts coordination of a team of robots within the framework of game theoretic learning algorithms. In particular a novel variant of fictitious play is proposed, by considering multi-model adaptive filters as a method to estimate other players’ strategies. The proposed algorithm can be used as a coordination mechanism between players when they should take decisions under uncertainty. Each player chooses an action after taking into account the actions of the other players and also the uncertainty. Uncertainty can occur either in terms of noisy observations or various types of other players. In addition, in contrast to other game-theoretic and heuristic algorithms for distributed optimisation, it is not necessary to find the optimal parameters a priori. Various parameter values can be used initially as inputs to different models. Therefore, the resulting decisions will be aggregate results of all the parameter values. Simulations are used to test the performance of the proposed methodology against other game-theoretic learning algorithms.</p
- …