50 research outputs found
Policy Transfer Methods in RoboCup Keep-Away
This study investigates multi-agent policy transfer coupled with behavior adaptation by objective and non-objective search variants of HyperNEAT in RoboCup keep-away. For comparison, evolved behaviors were compared to those adapted by RL methods: SARSA and Q-Learning, coupled with policy transfer. Keepaway was selected as it is an established multi-agent experimental platform. Similarly, the SARSA and Q-Learning methods were selected as both have been demonstrated for boosting behavior quality with policy transfer. Keep-away behaviors were gauged in terms of effectiveness and efficiency. Effectiveness was average task performance given policy transfer, where task performance was average ball control time by the keeper team. Efficiency was average number of evaluations taken to reach a minimum task performance threshold given policy transfer
A Deep Hierarchical Approach to Lifelong Learning in Minecraft
We propose a lifelong learning system that has the ability to reuse and
transfer knowledge from one task to another while efficiently retaining the
previously learned knowledge-base. Knowledge is transferred by learning
reusable skills to solve tasks in Minecraft, a popular video game which is an
unsolved and high-dimensional lifelong learning problem. These reusable skills,
which we refer to as Deep Skill Networks, are then incorporated into our novel
Hierarchical Deep Reinforcement Learning Network (H-DRLN) architecture using
two techniques: (1) a deep skill array and (2) skill distillation, our novel
variation of policy distillation (Rusu et. al. 2015) for learning skills. Skill
distillation enables the HDRLN to efficiently retain knowledge and therefore
scale in lifelong learning, by accumulating knowledge and encapsulating
multiple reusable skills into a single distilled network. The H-DRLN exhibits
superior performance and lower learning sample complexity compared to the
regular Deep Q Network (Mnih et. al. 2015) in sub-domains of Minecraft
Quicker Q-Learning in Multi-Agent Systems
Multi-agent learning in Markov Decisions Problems is challenging because of the presence ot two credit assignment problems: 1) How to credit an action taken at time step t for rewards received at t' greater than t; and 2) How to credit an action taken by agent i considering the system reward is a function of the actions of all the agents. The first credit assignment problem is typically addressed with temporal difference methods such as Q-learning OK TD(lambda) The second credit assi,onment problem is typically addressed either by hand-crafting reward functions that assign proper credit to an agent, or by making certain independence assumptions about an agent's state-space and reward function. To address both credit assignment problems simultaneously, we propose the Q Updates with Immediate Counterfactual Rewards-learning (QUICR-learning) designed to improve both the convergence properties and performance of Q-learning in large multi-agent problems. Instead of assuming that an agent s value function can be made independent of other agents, this method suppresses the impact of other agents using counterfactual rewards. Results on multi-agent grid-world problems over multiple topologies show that QUICR-learning can achieve up to thirty fold improvements in performance over both conventional and local Q-learning in the largest tested systems
Batch-iFDD for representation expansion in large MDPs
Matching pursuit (MP) methods are a promising class of feature construction algorithms for value function approximation. Yet existing MP methods require creating a pool of potential features, mandating expert knowledge or enumeration of a large feature pool, both of which hinder scalability. This paper introduces batch incremental feature dependency discovery (Batch-iFDD) as an MP method that inherits a provable convergence property. Additionally, Batch-iFDD does not require a large pool of features, leading to lower computational complexity. Empirical policy evaluation results across three domains with up to one million states highlight the scalability of Batch-iFDD over the previous state of the art MP algorithm.United States. Office of Naval Research (Grant N00014-07-1-0749)United States. Office of Naval Research (Grant N00014-11-1-0688
Improving the Performance of Complex Agent Plans Through Reinforcement Learning
Agent programming in complex, partially observable and stochastic domains usually requires a great deal of understanding of both the domain and the task, in order to provide the agent with the knowledge necessary to act effectively. While symbolic methods allow the designer to specify declarative knowledge about the domain, the resulting plan can be brittle since it is difficult to supply a symbolic model that is accurate enough to foresee all possible events in complex environments, especially in the case of partial observability. Reinforcement Learning (RL) techniques, on the other hand, can learn a policy and make use of a learned model, but it is difficult to reduce and shape the scope of the learning algorithm by exploiting a priori information. We propose a methodology for writing complex agent programs that can be effectively improved through experience. We show how to derive a stochastic process from a partial specification of the plan, so that the latter's perfomance can be improved solving a RL problem much smaller than classical RL formulations. Finally, we demonstrate our approach in the context of Keepaway Soccer, a common RL benchmark based on a RoboCup Soccer 2D simulator. Copyright © 2010, International Foundation for Autonomous Agents and Multiagent Systems (www.ifaamas.org). All rights reserved