321 research outputs found
Representation Learning on Graphs: A Reinforcement Learning Application
In this work, we study value function approximation in reinforcement learning
(RL) problems with high dimensional state or action spaces via a generalized
version of representation policy iteration (RPI). We consider the limitations
of proto-value functions (PVFs) at accurately approximating the value function
in low dimensions and we highlight the importance of features learning for an
improved low-dimensional value function approximation. Then, we adopt different
representation learning algorithm on graphs to learn the basis functions that
best represent the value function. We empirically show that node2vec, an
algorithm for scalable feature learning in networks, and the Variational Graph
Auto-Encoder constantly outperform the commonly used smooth proto-value
functions in low-dimensional feature space
Representation Policy Iteration
This paper addresses a fundamental issue central to approximation methods for
solving large Markov decision processes (MDPs): how to automatically learn the
underlying representation for value function approximation? A novel
theoretically rigorous framework is proposed that automatically generates
geometrically customized orthonormal sets of basis functions, which can be used
with any approximate MDP solver like least squares policy iteration (LSPI). The
key innovation is a coordinate-free representation of value functions, using
the theory of smooth functions on a Riemannian manifold. Hodge theory yields a
constructive method for generating basis functions for approximating value
functions based on the eigenfunctions of the self-adjoint (Laplace-Beltrami)
operator on manifolds. In effect, this approach performs a global Fourier
analysis on the state space graph to approximate value functions, where the
basis functions reflect the largescale topology of the underlying state space.
A new class of algorithms called Representation Policy Iteration (RPI) are
presented that automatically learn both basis functions and approximately
optimal policies. Illustrative experiments compare the performance of RPI with
that of LSPI using two handcoded basis functions (RBF and polynomial state
encodings).Comment: Appears in Proceedings of the Twenty-First Conference on Uncertainty
in Artificial Intelligence (UAI2005
- …