582 research outputs found

    SMIX(λ\lambda): Enhancing Centralized Value Functions for Cooperative Multi-Agent Reinforcement Learning

    Full text link
    Learning a stable and generalizable centralized value function (CVF) is a crucial but challenging task in multi-agent reinforcement learning (MARL), as it has to deal with the issue that the joint action space increases exponentially with the number of agents in such scenarios. This paper proposes an approach, named SMIX(λ{\lambda}), to address the issue using an efficient off-policy centralized training method within a flexible learner search space. As importance sampling for such off-policy training is both computationally costly and numerically unstable, we proposed to use the λ{\lambda}-return as a proxy to compute the TD error. With this new loss function objective, we adopt a modified QMIX network structure as the base to train our model. By further connecting it with the Q(λ){Q(\lambda)} approach from an unified expectation correction viewpoint, we show that the proposed SMIX(λ{\lambda}) is equivalent to Q(λ){Q(\lambda)} and hence shares its convergence properties, while without being suffered from the aforementioned curse of dimensionality problem inherent in MARL. Experiments on the StarCraft Multi-Agent Challenge (SMAC) benchmark demonstrate that our approach not only outperforms several state-of-the-art MARL methods by a large margin, but also can be used as a general tool to improve the overall performance of other CTDE-type algorithms by enhancing their CVFs

    The 1990 progress report and future plans

    Get PDF
    This document describes the progress and plans of the Artificial Intelligence Research Branch (RIA) at ARC in 1990. Activities span a range from basic scientific research to engineering development and to fielded NASA applications, particularly those applications that are enabled by basic research carried out at RIA. Work is conducted in-house and through collaborative partners in academia and industry. Our major focus is on a limited number of research themes with a dual commitment to technical excellence and proven applicability to NASA short, medium, and long-term problems. RIA acts as the Agency's lead organization for research aspects of artificial intelligence, working closely with a second research laboratory at JPL and AI applications groups at all NASA centers

    Developing, Evaluating and Scaling Learning Agents in Multi-Agent Environments

    Full text link
    The Game Theory & Multi-Agent team at DeepMind studies several aspects of multi-agent learning ranging from computing approximations to fundamental concepts in game theory to simulating social dilemmas in rich spatial environments and training 3-d humanoids in difficult team coordination tasks. A signature aim of our group is to use the resources and expertise made available to us at DeepMind in deep reinforcement learning to explore multi-agent systems in complex environments and use these benchmarks to advance our understanding. Here, we summarise the recent work of our team and present a taxonomy that we feel highlights many important open challenges in multi-agent research.Comment: Published in AI Communications 202

    On Transforming Reinforcement Learning by Transformer: The Development Trajectory

    Full text link
    Transformer, originally devised for natural language processing, has also attested significant success in computer vision. Thanks to its super expressive power, researchers are investigating ways to deploy transformers to reinforcement learning (RL) and the transformer-based models have manifested their potential in representative RL benchmarks. In this paper, we collect and dissect recent advances on transforming RL by transformer (transformer-based RL or TRL), in order to explore its development trajectory and future trend. We group existing developments in two categories: architecture enhancement and trajectory optimization, and examine the main applications of TRL in robotic manipulation, text-based games, navigation and autonomous driving. For architecture enhancement, these methods consider how to apply the powerful transformer structure to RL problems under the traditional RL framework, which model agents and environments much more precisely than deep RL methods, but they are still limited by the inherent defects of traditional RL algorithms, such as bootstrapping and "deadly triad". For trajectory optimization, these methods treat RL problems as sequence modeling and train a joint state-action model over entire trajectories under the behavior cloning framework, which are able to extract policies from static datasets and fully use the long-sequence modeling capability of the transformer. Given these advancements, extensions and challenges in TRL are reviewed and proposals about future direction are discussed. We hope that this survey can provide a detailed introduction to TRL and motivate future research in this rapidly developing field.Comment: 26 page

    Life Long Learning In Sparse Learning Environments

    Get PDF
    Life long learning is a machine learning technique that deals with learning sequential tasks over time. It seeks to transfer knowledge from previous learning tasks to new learning tasks in order to increase generalization performance and learning speed. Real-time learning environments in which many agents are participating may provide learning opportunities but they are spread out in time and space outside of the geographical scope of a single learning agent. This research seeks to provide an algorithm and framework for life long learning among a network of agents in a sparse real-time learning environment. This work will utilize the robust knowledge representation of neural networks, and make use of both functional and representational knowledge transfer to accomplish this task. A new generative life long learning algorithm utilizing cascade correlation and reverberating pseudo-rehearsal and incorporating a method for merging divergent life long learning paths will be implemented

    On the Combination of Game-Theoretic Learning and Multi Model Adaptive Filters

    Get PDF
    This paper casts coordination of a team of robots within the framework of game theoretic learning algorithms. In particular a novel variant of fictitious play is proposed, by considering multi-model adaptive filters as a method to estimate other players’ strategies. The proposed algorithm can be used as a coordination mechanism between players when they should take decisions under uncertainty. Each player chooses an action after taking into account the actions of the other players and also the uncertainty. Uncertainty can occur either in terms of noisy observations or various types of other players. In addition, in contrast to other game-theoretic and heuristic algorithms for distributed optimisation, it is not necessary to find the optimal parameters a priori. Various parameter values can be used initially as inputs to different models. Therefore, the resulting decisions will be aggregate results of all the parameter values. Simulations are used to test the performance of the proposed methodology against other game-theoretic learning algorithms.</p
    corecore