Search CORE

50 research outputs found

Policy Transfer Methods in RoboCup Keep-Away

Author: Ammar H.
Didi S.
Doncleux S.
Moshaiov A.
Stone P.
Taylor M.
Taylor M.
Taylor M.
Verbancsics P.
Whiteson S.
Publication venue
Publication date: 01/01/2018
Field of study

This study investigates multi-agent policy transfer coupled with behavior adaptation by objective and non-objective search variants of HyperNEAT in RoboCup keep-away. For comparison, evolved behaviors were compared to those adapted by RL methods: SARSA and Q-Learning, coupled with policy transfer. Keepaway was selected as it is an established multi-agent experimental platform. Similarly, the SARSA and Q-Learning methods were selected as both have been demonstrated for boosting behavior quality with policy transfer. Keep-away behaviors were gauged in terms of effectiveness and efficiency. Effectiveness was average task performance given policy transfer, where task performance was average ball control time by the keeper team. Efficiency was average number of evaluations taken to reach a minimum task performance threshold given policy transfer

Crossref

UCT Computer Science Research Document Archive

Behavior Acquisition in RoboCup Middle Size League Domain

Author: Minoru Asada
Yasutake Takahashi
Publication venue: 'IntechOpen'
Publication date: 01/12/2007
Field of study

IntechOpen

Crossref

A Deep Hierarchical Approach to Lifelong Learning in Minecraft

Author: Givony Shahar
Mankowitz Daniel J.
Mannor Shie
Tessler Chen
Zahavy Tom
Publication venue
Publication date: 30/11/2016
Field of study

We propose a lifelong learning system that has the ability to reuse and transfer knowledge from one task to another while efficiently retaining the previously learned knowledge-base. Knowledge is transferred by learning reusable skills to solve tasks in Minecraft, a popular video game which is an unsolved and high-dimensional lifelong learning problem. These reusable skills, which we refer to as Deep Skill Networks, are then incorporated into our novel Hierarchical Deep Reinforcement Learning Network (H-DRLN) architecture using two techniques: (1) a deep skill array and (2) skill distillation, our novel variation of policy distillation (Rusu et. al. 2015) for learning skills. Skill distillation enables the HDRLN to efficiently retain knowledge and therefore scale in lifelong learning, by accumulating knowledge and encapsulating multiple reusable skills into a single distilled network. The H-DRLN exhibits superior performance and lower learning sample complexity compared to the regular Deep Q Network (Mnih et. al. 2015) in sub-domains of Minecraft

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

Quicker Q-Learning in Multi-Agent Systems

Author: Agogino Adrian K.
Tumer Kagan
Publication venue
Publication date
Field of study

Multi-agent learning in Markov Decisions Problems is challenging because of the presence ot two credit assignment problems: 1) How to credit an action taken at time step t for rewards received at t' greater than t; and 2) How to credit an action taken by agent i considering the system reward is a function of the actions of all the agents. The first credit assignment problem is typically addressed with temporal difference methods such as Q-learning OK TD(lambda) The second credit assi,onment problem is typically addressed either by hand-crafting reward functions that assign proper credit to an agent, or by making certain independence assumptions about an agent's state-space and reward function. To address both credit assignment problems simultaneously, we propose the Q Updates with Immediate Counterfactual Rewards-learning (QUICR-learning) designed to improve both the convergence properties and performance of Q-learning in large multi-agent problems. Instead of assuming that an agent s value function can be made independent of other agents, this method suppresses the impact of other agents using counterfactual rewards. Results on multi-agent grid-world problems over multiple topologies show that QUICR-learning can achieve up to thirty fold improvements in performance over both conventional and local Q-learning in the largest tested systems

NASA Technical Reports Server

Proposal and Evaluation of the Improved Penalty Avoiding Rational Policy Making Algorithm

Author: Hiroaki Kobayashi
Kazuteru Miyazaki
Takuji Namatame
Publication venue: 'IntechOpen'
Publication date: 01/01/2009
Field of study

IntechOpen

Batch-iFDD for representation expansion in large MDPs

Author: Geramifard Alborz
How Jonathan P.
Roy Nicholas
Walsh Thomas J.
Publication venue: Association for Uncertainty in Artificial Intelligence (AUAI)
Publication date: 01/07/2013
Field of study

Matching pursuit (MP) methods are a promising class of feature construction algorithms for value function approximation. Yet existing MP methods require creating a pool of potential features, mandating expert knowledge or enumeration of a large feature pool, both of which hinder scalability. This paper introduces batch incremental feature dependency discovery (Batch-iFDD) as an MP method that inherits a provable convergence property. Additionally, Batch-iFDD does not require a large pool of features, leading to lower computational complexity. Empirical policy evaluation results across three domains with up to one million states highlight the scalability of Batch-iFDD over the previous state of the art MP algorithm.United States. Office of Naval Research (Grant N00014-07-1-0749)United States. Office of Naval Research (Grant N00014-11-1-0688

DSpace@MIT

Improving the Performance of Complex Agent Plans Through Reinforcement Learning

Author: Iocchi Luca
Leonetti Matteo
Publication venue: Dagstuhl Seminar Proceedings. 10081 - Cognitive Robotics
Publication date: 01/01/2010
Field of study

Agent programming in complex, partially observable and stochastic domains usually requires a great deal of understanding of both the domain and the task, in order to provide the agent with the knowledge necessary to act effectively. While symbolic methods allow the designer to specify declarative knowledge about the domain, the resulting plan can be brittle since it is difficult to supply a symbolic model that is accurate enough to foresee all possible events in complex environments, especially in the case of partial observability. Reinforcement Learning (RL) techniques, on the other hand, can learn a policy and make use of a learned model, but it is difficult to reduce and shape the scope of the learning algorithm by exploiting a priori information. We propose a methodology for writing complex agent programs that can be effectively improved through experience. We show how to derive a stochastic process from a partial specification of the plan, so that the latter's perfomance can be improved solving a RL problem much smaller than classical RL formulations. Finally, we demonstrate our approach in the context of Keepaway Soccer, a common RL benchmark based on a RoboCup Soccer 2D simulator. Copyright © 2010, International Foundation for Autonomous Agents and Multiagent Systems (www.ifaamas.org). All rights reserved

Dagstuhl Research Online Publication Server

Archivio della ricerca- Università di Roma La Sapienza