364 research outputs found
Meta Reinforcement Learning with Latent Variable Gaussian Processes
Learning from small data sets is critical in many practical applications
where data collection is time consuming or expensive, e.g., robotics, animal
experiments or drug design. Meta learning is one way to increase the data
efficiency of learning algorithms by generalizing learned concepts from a set
of training tasks to unseen, but related, tasks. Often, this relationship
between tasks is hard coded or relies in some other way on human expertise. In
this paper, we frame meta learning as a hierarchical latent variable model and
infer the relationship between tasks automatically from data. We apply our
framework in a model-based reinforcement learning setting and show that our
meta-learning model effectively generalizes to novel tasks by identifying how
new tasks relate to prior ones from minimal data. This results in up to a 60%
reduction in the average interaction time needed to solve tasks compared to
strong baselines.Comment: 11 pages, 7 figure
Inferring Smooth Control: Monte Carlo Posterior Policy Iteration with Gaussian Processes
Monte Carlo methods have become increasingly relevant for control of
non-differentiable systems, approximate dynamics models and learning from data.
These methods scale to high-dimensional spaces and are effective at the
non-convex optimizations often seen in robot learning. We look at sample-based
methods from the perspective of inference-based control, specifically posterior
policy iteration. From this perspective, we highlight how Gaussian noise priors
produce rough control actions that are unsuitable for physical robot
deployment. Considering smoother Gaussian process priors, as used in episodic
reinforcement learning and motion planning, we demonstrate how smoother model
predictive control can be achieved using online sequential inference. This
inference is realized through an efficient factorization of the action
distribution and a novel means of optimizing the likelihood temperature to
improve importance sampling accuracy. We evaluate this approach on several
high-dimensional robot control tasks, matching the sample efficiency of prior
heuristic methods while also ensuring smoothness. Simulation results can be
seen at https://monte-carlo-ppi.github.io/.Comment: 43 pages, 37 figures. Conference on Robot Learning 202
Hierarchical Decomposition of Nonlinear Dynamics and Control for System Identification and Policy Distillation
The control of nonlinear dynamical systems remains a major challenge for
autonomous agents. Current trends in reinforcement learning (RL) focus on
complex representations of dynamics and policies, which have yielded impressive
results in solving a variety of hard control tasks. However, this new
sophistication and extremely over-parameterized models have come with the cost
of an overall reduction in our ability to interpret the resulting policies. In
this paper, we take inspiration from the control community and apply the
principles of hybrid switching systems in order to break down complex dynamics
into simpler components. We exploit the rich representational power of
probabilistic graphical models and derive an expectation-maximization (EM)
algorithm for learning a sequence model to capture the temporal structure of
the data and automatically decompose nonlinear dynamics into stochastic
switching linear dynamical systems. Moreover, we show how this framework of
switching models enables extracting hierarchies of Markovian and
auto-regressive locally linear controllers from nonlinear experts in an
imitation learning scenario.Comment: 2nd Annual Conference on Learning for Dynamics and Contro
- …