2 research outputs found
Off-Policy General Value Functions to Represent Dynamic Role Assignments in RoboCup 3D Soccer Simulation
Collecting and maintaining accurate world knowledge in a dynamic, complex,
adversarial, and stochastic environment such as the RoboCup 3D Soccer
Simulation is a challenging task. Knowledge should be learned in real-time with
time constraints. We use recently introduced Off-Policy Gradient Descent
algorithms within Reinforcement Learning that illustrate learnable knowledge
representations for dynamic role assignments. The results show that the agents
have learned competitive policies against the top teams from the RoboCup 2012
competitions for three vs three, five vs five, and seven vs seven agents. We
have explicitly used subsets of agents to identify the dynamics and the
semantics for which the agents learn to maximize their performance measures,
and to gather knowledge about different objectives, so that all agents
participate effectively and efficiently within the group.Comment: 18 pages, 8 figure
Affordance as general value function: A computational model
General value functions (GVFs) in the reinforcement learning (RL) literature
are long-term predictive summaries of the outcomes of agents following specific
policies in the environment. Affordances as perceived action possibilities with
specific valence may be cast into predicted policy-relative goodness and
modelled as GVFs. A systematic explication of this connection shows that GVFs
and especially their deep learning embodiments (1) realize affordance
prediction as a form of direct perception, (2) illuminate the fundamental
connection between action and perception in affordance, and (3) offer a
scalable way to learn affordances using RL methods. Through an extensive review
of existing literature on GVF applications and representative affordance
research in robotics, we demonstrate that GVFs provide the right framework for
learning affordances in real-world applications. In addition, we highlight a
few new avenues of research opened up by the perspective of "affordance as
GVF", including using GVFs for orchestrating complex behaviors