5 research outputs found
Meta-Reinforcement Learning via Language Instructions
Although deep reinforcement learning has recently been very successful at
learning complex behaviors, it requires a tremendous amount of data to learn a
task. One of the fundamental reasons causing this limitation lies in the nature
of the trial-and-error learning paradigm of reinforcement learning, where the
agent communicates with the environment and progresses in the learning only
relying on the reward signal. This is implicit and rather insufficient to learn
a task well. On the contrary, humans are usually taught new skills via natural
language instructions. Utilizing language instructions for robotic motion
control to improve the adaptability is a recently emerged topic and
challenging. In this paper, we present a meta-RL algorithm that addresses the
challenge of learning skills with language instructions in multiple
manipulation tasks. On the one hand, our algorithm utilizes the language
instructions to shape its interpretation of the task, on the other hand, it
still learns to solve task in a trial-and-error process. We evaluate our
algorithm on the robotic manipulation benchmark (Meta-World) and it
significantly outperforms state-of-the-art methods in terms of training and
testing task success rates. Codes are available at
\url{https://tumi6robot.wixsite.com/million}
Language-Conditioned Imitation Learning with Base Skill Priors under Unstructured Data
The growing interest in language-conditioned robot manipulation aims to
develop robots capable of understanding and executing complex tasks, with the
objective of enabling robots to interpret language commands and manipulate
objects accordingly. While language-conditioned approaches demonstrate
impressive capabilities for addressing tasks in familiar environments, they
encounter limitations in adapting to unfamiliar environment settings. In this
study, we propose a general-purpose, language-conditioned approach that
combines base skill priors and imitation learning under unstructured data to
enhance the algorithm's generalization in adapting to unfamiliar environments.
We assess our model's performance in both simulated and real-world environments
using a zero-shot setting. In the simulated environment, the proposed approach
surpasses previously reported scores for CALVIN benchmark, especially in the
challenging Zero-Shot Multi-Environment setting. The average completed task
length, indicating the average number of tasks the agent can continuously
complete, improves more than 2.5 times compared to the state-of-the-art method
HULC. In addition, we conduct a zero-shot evaluation of our policy in a
real-world setting, following training exclusively in simulated environments
without additional specific adaptations. In this evaluation, we set up ten
tasks and achieved an average 30% improvement in our approach compared to the
current state-of-the-art approach, demonstrating a high generalization
capability in both simulated environments and the real world. For further
details, including access to our code and videos, please refer to our
supplementary materials
Learning from Symmetry: Meta-Reinforcement Learning with Symmetric Data and Language Instructions
Meta-reinforcement learning (meta-RL) is a promising approach that enables
the agent to learn new tasks quickly. However, most meta-RL algorithms show
poor generalization in multiple-task scenarios due to the insufficient task
information provided only by rewards. Language-conditioned meta-RL improves the
generalization by matching language instructions and the agent's behaviors.
Learning from symmetry is an important form of human learning, therefore,
combining symmetry and language instructions into meta-RL can help improve the
algorithm's generalization and learning efficiency. We thus propose a dual-MDP
meta-reinforcement learning method that enables learning new tasks efficiently
with symmetric data and language instructions. We evaluate our method in
multiple challenging manipulation tasks, and experimental results show our
method can greatly improve the generalization and efficiency of
meta-reinforcement learning
Safety Guaranteed Manipulation Based on Reinforcement Learning Planner and Model Predictive Control Actor
Deep reinforcement learning (RL) has been endowed with high expectations in
tackling challenging manipulation tasks in an autonomous and self-directed
fashion. Despite the significant strides made in the development of
reinforcement learning, the practical deployment of this paradigm is hindered
by at least two barriers, namely, the engineering of a reward function and
ensuring the safety guaranty of learning-based controllers. In this paper, we
address these challenging limitations by proposing a framework that merges a
reinforcement learning \lstinline[columns=fixed]{planner} that is trained using
sparse rewards with a model predictive controller (MPC)
\lstinline[columns=fixed]{actor}, thereby offering a safe policy. On the one
hand, the RL \lstinline[columns=fixed]{planner} learns from sparse rewards by
selecting intermediate goals that are easy to achieve in the short term and
promising to lead to target goals in the long term. On the other hand, the MPC
\lstinline[columns=fixed]{actor} takes the suggested intermediate goals from
the RL \lstinline[columns=fixed]{planner} as the input and predicts how the
robot's action will enable it to reach that goal while avoiding any obstacles
over a short period of time. We evaluated our method on four challenging
manipulation tasks with dynamic obstacles and the results demonstrate that, by
leveraging the complementary strengths of these two components, the agent can
solve manipulation tasks in complex, dynamic environments safely with a
success rate. Videos are available at
\url{https://videoviewsite.wixsite.com/mpc-hgg}
Meta-Reinforcement Learning Based on Self-Supervised Task Representation Learning
Meta-reinforcement learning enables artificial agents to learn from related training tasks and adapt to new tasks efficiently with minimal interaction data. However, most existing research is still limited to narrow task distributions that are parametric and stationary, and does not consider out-of-distribution tasks during the evaluation, thus, restricting its application. In this paper, we propose MoSS, a context-based Meta-reinforcement learning algorithm based on Self-Supervised task representation learning to address this challenge. We extend meta-RL to broad non-parametric task distributions which have never been explored before, and also achieve state-of-the-art results in non-stationary and out-of-distribution tasks. Specifically, MoSS consists of a task inference module and a policy module. We utilize the Gaussian mixture model for task representation to imitate the parametric and non-parametric task variations. Additionally, our online adaptation strategy enables the agent to react at the first sight of a task change, thus being applicable in non-stationary tasks. MoSS also exhibits strong generalization robustness in out-of-distributions tasks which benefits from the reliable and robust task representation. The policy is built on top of an off-policy RL algorithm and the entire network is trained completely off-policy to ensure high sample efficiency. On MuJoCo and Meta-World benchmarks, MoSS outperforms prior works in terms of asymptotic performance, sample efficiency (3-50x faster), adaptation efficiency, and generalization robustness on broad and diverse task distributions