55 research outputs found
Skill-based Model-based Reinforcement Learning
Model-based reinforcement learning (RL) is a sample-efficient way of learning
complex behaviors by leveraging a learned single-step dynamics model to plan
actions in imagination. However, planning every action for long-horizon tasks
is not practical, akin to a human planning out every muscle movement. Instead,
humans efficiently plan with high-level skills to solve complex tasks. From
this intuition, we propose a Skill-based Model-based RL framework (SkiMo) that
enables planning in the skill space using a skill dynamics model, which
directly predicts the skill outcomes, rather than predicting all small details
in the intermediate states, step by step. For accurate and efficient long-term
planning, we jointly learn the skill dynamics model and a skill repertoire from
prior experience. We then harness the learned skill dynamics model to
accurately simulate and plan over long horizons in the skill space, which
enables efficient downstream learning of long-horizon, sparse reward tasks.
Experimental results in navigation and manipulation domains show that SkiMo
extends the temporal horizon of model-based approaches and improves the sample
efficiency for both model-based RL and skill-based RL. Code and videos are
available at \url{https://clvrai.com/skimo}Comment: Website: \url{https://clvrai.com/skimo
CURIOUS: Intrinsically Motivated Modular Multi-Goal Reinforcement Learning
In open-ended environments, autonomous learning agents must set their own
goals and build their own curriculum through an intrinsically motivated
exploration. They may consider a large diversity of goals, aiming to discover
what is controllable in their environments, and what is not. Because some goals
might prove easy and some impossible, agents must actively select which goal to
practice at any moment, to maximize their overall mastery on the set of
learnable goals. This paper proposes CURIOUS, an algorithm that leverages 1) a
modular Universal Value Function Approximator with hindsight learning to
achieve a diversity of goals of different kinds within a unique policy and 2)
an automated curriculum learning mechanism that biases the attention of the
agent towards goals maximizing the absolute learning progress. Agents focus
sequentially on goals of increasing complexity, and focus back on goals that
are being forgotten. Experiments conducted in a new modular-goal robotic
environment show the resulting developmental self-organization of a learning
curriculum, and demonstrate properties of robustness to distracting goals,
forgetting and changes in body properties.Comment: Accepted at ICML 201
Risk Sensitive Model-Based Reinforcement Learning using Uncertainty Guided Planning
Identifying uncertainty and taking mitigating actions is crucial for safe and
trustworthy reinforcement learning agents, especially when deployed in
high-risk environments. In this paper, risk sensitivity is promoted in a
model-based reinforcement learning algorithm by exploiting the ability of a
bootstrap ensemble of dynamics models to estimate environment epistemic
uncertainty. We propose uncertainty guided cross-entropy method planning, which
penalises action sequences that result in high variance state predictions
during model rollouts, guiding the agent to known areas of the state space with
low uncertainty. Experiments display the ability for the agent to identify
uncertain regions of the state space during planning and to take actions that
maintain the agent within high confidence areas, without the requirement of
explicit constraints. The result is a reduction in the performance in terms of
attaining reward, displaying a trade-off between risk and return.Comment: Safe RL Workshop NeurIPS 202
Double Check Your State Before Trusting It: Confidence-Aware Bidirectional Offline Model-Based Imagination
The learned policy of model-free offline reinforcement learning (RL) methods
is often constrained to stay within the support of datasets to avoid possible
dangerous out-of-distribution actions or states, making it challenging to
handle out-of-support region. Model-based RL methods offer a richer dataset and
benefit generalization by generating imaginary trajectories with either trained
forward or reverse dynamics model. However, the imagined transitions may be
inaccurate, thus downgrading the performance of the underlying offline RL
method. In this paper, we propose to augment the offline dataset by using
trained bidirectional dynamics models and rollout policies with double check.
We introduce conservatism by trusting samples that the forward model and
backward model agree on. Our method, confidence-aware bidirectional offline
model-based imagination, generates reliable samples and can be combined with
any model-free offline RL method. Experimental results on the D4RL benchmarks
demonstrate that our method significantly boosts the performance of existing
model-free offline RL algorithms and achieves competitive or better scores
against baseline methods.Comment: NeurIPS 202
- …