162 research outputs found
ED2: Environment Dynamics Decomposition World Models for Continuous Control
Model-based reinforcement learning (MBRL) achieves significant sample
efficiency in practice in comparison to model-free RL, but its performance is
often limited by the existence of model prediction error. To reduce the model
error, standard MBRL approaches train a single well-designed network to fit the
entire environment dynamics, but this wastes rich information on multiple
sub-dynamics which can be modeled separately, allowing us to construct the
world model more accurately. In this paper, we propose the Environment Dynamics
Decomposition (ED2), a novel world model construction framework that models the
environment in a decomposing manner. ED2 contains two key components:
sub-dynamics discovery (SD2) and dynamics decomposition prediction (D2P). SD2
discovers the sub-dynamics in an environment automatically and then D2P
constructs the decomposed world model following the sub-dynamics. ED2 can be
easily combined with existing MBRL algorithms and empirical results show that
ED2 significantly reduces the model error, increases the sample efficiency, and
achieves higher asymptotic performance when combined with the state-of-the-art
MBRL algorithms on various continuous control tasks. Our code is open source
and available at https://github.com/ED2-source-code/ED2.Comment: 10 pages, 13 figure
Towards A Unified Policy Abstraction Theory and Representation Learning Approach in Markov Decision Processes
Lying on the heart of intelligent decision-making systems, how policy is
represented and optimized is a fundamental problem. The root challenge in this
problem is the large scale and the high complexity of policy space, which
exacerbates the difficulty of policy learning especially in real-world
scenarios. Towards a desirable surrogate policy space, recently policy
representation in a low-dimensional latent space has shown its potential in
improving both the evaluation and optimization of policy. The key question
involved in these studies is by what criterion we should abstract the policy
space for desired compression and generalization. However, both the theory on
policy abstraction and the methodology on policy representation learning are
less studied in the literature. In this work, we make very first efforts to
fill up the vacancy. First, we propose a unified policy abstraction theory,
containing three types of policy abstraction associated to policy features at
different levels. Then, we generalize them to three policy metrics that
quantify the distance (i.e., similarity) of policies, for more convenient use
in learning policy representation. Further, we propose a policy representation
learning approach based on deep metric learning. For the empirical study, we
investigate the efficacy of the proposed policy metrics and representations, in
characterizing policy difference and conveying policy generalization
respectively. Our experiments are conducted in both policy optimization and
evaluation problems, containing trust-region policy optimization (TRPO),
diversity-guided evolution strategy (DGES) and off-policy evaluation (OPE).
Somewhat naturally, the experimental results indicate that there is no a
universally optimal abstraction for all downstream learning problems; while the
influence-irrelevance policy abstraction can be a generally preferred choice.Comment: Preprint versio
- …