2 research outputs found

    Off-Policy General Value Functions to Represent Dynamic Role Assignments in RoboCup 3D Soccer Simulation

    Full text link
    Collecting and maintaining accurate world knowledge in a dynamic, complex, adversarial, and stochastic environment such as the RoboCup 3D Soccer Simulation is a challenging task. Knowledge should be learned in real-time with time constraints. We use recently introduced Off-Policy Gradient Descent algorithms within Reinforcement Learning that illustrate learnable knowledge representations for dynamic role assignments. The results show that the agents have learned competitive policies against the top teams from the RoboCup 2012 competitions for three vs three, five vs five, and seven vs seven agents. We have explicitly used subsets of agents to identify the dynamics and the semantics for which the agents learn to maximize their performance measures, and to gather knowledge about different objectives, so that all agents participate effectively and efficiently within the group.Comment: 18 pages, 8 figure

    Affordance as general value function: A computational model

    Full text link
    General value functions (GVFs) in the reinforcement learning (RL) literature are long-term predictive summaries of the outcomes of agents following specific policies in the environment. Affordances as perceived action possibilities with specific valence may be cast into predicted policy-relative goodness and modelled as GVFs. A systematic explication of this connection shows that GVFs and especially their deep learning embodiments (1) realize affordance prediction as a form of direct perception, (2) illuminate the fundamental connection between action and perception in affordance, and (3) offer a scalable way to learn affordances using RL methods. Through an extensive review of existing literature on GVF applications and representative affordance research in robotics, we demonstrate that GVFs provide the right framework for learning affordances in real-world applications. In addition, we highlight a few new avenues of research opened up by the perspective of "affordance as GVF", including using GVFs for orchestrating complex behaviors
    corecore