5 research outputs found
Diversify & Conquer: Outcome-directed Curriculum RL via Out-of-Distribution Disagreement
Reinforcement learning (RL) often faces the challenges of uninformed search
problems where the agent should explore without access to the domain knowledge
such as characteristics of the environment or external rewards. To tackle these
challenges, this work proposes a new approach for curriculum RL called
Diversify for Disagreement & Conquer (D2C). Unlike previous curriculum learning
methods, D2C requires only a few examples of desired outcomes and works in any
environment, regardless of its geometry or the distribution of the desired
outcome examples. The proposed method performs diversification of the
goal-conditional classifiers to identify similarities between visited and
desired outcome states and ensures that the classifiers disagree on states from
out-of-distribution, which enables quantifying the unexplored region and
designing an arbitrary goal-conditioned intrinsic reward signal in a simple and
intuitive way. The proposed method then employs bipartite matching to define a
curriculum learning objective that produces a sequence of well-adjusted
intermediate goals, which enable the agent to automatically explore and conquer
the unexplored region. We present experimental results demonstrating that D2C
outperforms prior curriculum RL methods in both quantitative and qualitative
aspects, even with the arbitrarily distributed desired outcome examples
S2P: State-conditioned Image Synthesis for Data Augmentation in Offline Reinforcement Learning
Offline reinforcement learning (Offline RL) suffers from the innate
distributional shift as it cannot interact with the physical environment during
training. To alleviate such limitation, state-based offline RL leverages a
learned dynamics model from the logged experience and augments the predicted
state transition to extend the data distribution. For exploiting such benefit
also on the image-based RL, we firstly propose a generative model, S2P
(State2Pixel), which synthesizes the raw pixel of the agent from its
corresponding state. It enables bridging the gap between the state and the
image domain in RL algorithms, and virtually exploring unseen image
distribution via model-based transition in the state space. Through
experiments, we confirm that our S2P-based image synthesis not only improves
the image-based offline RL performance but also shows powerful generalization
capability on unseen tasks.Comment: NeurIPS 2022, first two authors contributed equall
CQM: Curriculum Reinforcement Learning with a Quantized World Model
Recent curriculum Reinforcement Learning (RL) has shown notable progress in
solving complex tasks by proposing sequences of surrogate tasks. However, the
previous approaches often face challenges when they generate curriculum goals
in a high-dimensional space. Thus, they usually rely on manually specified goal
spaces. To alleviate this limitation and improve the scalability of the
curriculum, we propose a novel curriculum method that automatically defines the
semantic goal space which contains vital information for the curriculum
process, and suggests curriculum goals over it. To define the semantic goal
space, our method discretizes continuous observations via vector
quantized-variational autoencoders (VQ-VAE) and restores the temporal relations
between the discretized observations by a graph. Concurrently, ours suggests
uncertainty and temporal distance-aware curriculum goals that converges to the
final goals over the automatically composed goal space. We demonstrate that the
proposed method allows efficient explorations in an uninformed environment with
raw goal examples only. Also, ours outperforms the state-of-the-art curriculum
RL methods on data efficiency and performance, in various goal-reaching tasks
even with ego-centric visual inputs.Comment: Accepted to NeurIPS 202
Automating Reinforcement Learning with Example-based Resets
Deep reinforcement learning has enabled robots to learn motor skills from
environmental interactions with minimal to no prior knowledge. However,
existing reinforcement learning algorithms assume an episodic setting, in which
the agent resets to a fixed initial state distribution at the end of each
episode, to successfully train the agents from repeated trials. Such reset
mechanism, while trivial for simulated tasks, can be challenging to provide for
real-world robotics tasks. Resets in robotic systems often require extensive
human supervision and task-specific workarounds, which contradicts the goal of
autonomous robot learning. In this paper, we propose an extension to
conventional reinforcement learning towards greater autonomy by introducing an
additional agent that learns to reset in a self-supervised manner. The reset
agent preemptively triggers a reset to prevent manual resets and implicitly
imposes a curriculum for the forward agent. We apply our method to learn from
scratch on a suite of simulated and real-world continuous control tasks and
demonstrate that the reset agent successfully learns to reduce manual resets
whilst also allowing the forward policy to improve gradually over time.Comment: 8 pages, 6 figures; accepted for publication in the IEEE Robotics and
Automation Letters (RA-L); source code available at
https://github.com/jigangkim/autoreset_rl ; supplementary video available at
https://youtu.be/himd0Z5b64
Demonstration-free Autonomous Reinforcement Learning via Implicit and Bidirectional Curriculum
While reinforcement learning (RL) has achieved great success in acquiring
complex skills solely from environmental interactions, it assumes that resets
to the initial state are readily available at the end of each episode. Such an
assumption hinders the autonomous learning of embodied agents due to the
time-consuming and cumbersome workarounds for resetting in the physical world.
Hence, there has been a growing interest in autonomous RL (ARL) methods that
are capable of learning from non-episodic interactions. However, existing works
on ARL are limited by their reliance on prior data and are unable to learn in
environments where task-relevant interactions are sparse. In contrast, we
propose a demonstration-free ARL algorithm via Implicit and Bi-directional
Curriculum (IBC). With an auxiliary agent that is conditionally activated upon
learning progress and a bidirectional goal curriculum based on optimal
transport, our method outperforms previous methods, even the ones that leverage
demonstrations.Comment: accepted to ICML 2023 (poster