Many real-world problems are inherently hierarchically
structured. The use of this structure
in an agent’s policy may well be the
key to improved scalability and higher performance.
However, such hierarchical structures
cannot be exploited by current policy
search algorithms. We will concentrate on
a basic, but highly relevant hierarchy — the
‘mixed option’ policy. Here, a gating network
first decides which of the options to execute
and, subsequently, the option-policy determines
the action.
In this paper, we reformulate learning a hierarchical
policy as a latent variable estimation
problem and subsequently extend the
Relative Entropy Policy Search (REPS) to
the latent variable case. We show that our
Hierarchical REPS can learn versatile solutions
while also showing an increased performance
in terms of learning speed and quality
of the found policy in comparison to the nonhierarchical
approach