Search CORE

10 research outputs found

Privileged Knowledge Distillation for Sim-to-Real Policy Generalization

Author: Bai Chenjia
He Haoran
Lai Hang
Wang Lingxiao
Zhang Weinan
Publication venue
Publication date: 29/05/2023
Field of study

Reinforcement Learning (RL) has recently achieved remarkable success in robotic control. However, most RL methods operate in simulated environments where privileged knowledge (e.g., dynamics, surroundings, terrains) is readily available. Conversely, in real-world scenarios, robot agents usually rely solely on local states (e.g., proprioceptive feedback of robot joints) to select actions, leading to a significant sim-to-real gap. Existing methods address this gap by either gradually reducing the reliance on privileged knowledge or performing a two-stage policy imitation. However, we argue that these methods are limited in their ability to fully leverage the privileged knowledge, resulting in suboptimal performance. In this paper, we propose a novel single-stage privileged knowledge distillation method called the Historical Information Bottleneck (HIB) to narrow the sim-to-real gap. In particular, HIB learns a privileged knowledge representation from historical trajectories by capturing the underlying changeable dynamic information. Theoretical analysis shows that the learned privileged knowledge representation helps reduce the value discrepancy between the oracle and learned policies. Empirical experiments on both simulated and real-world tasks demonstrate that HIB yields improved generalizability compared to previous methods.Comment: 22 page

arXiv.org e-Print Archive

Diffusion Model is an Effective Planner and Data Synthesizer for Multi-Task Reinforcement Learning

Author: Bai Chenjia
He Haoran
Li Xuelong
Wang Dong
Xu Kang
Yang Zhuoran
Zhang Weinan
Zhao Bin
Publication venue
Publication date: 29/05/2023
Field of study

Diffusion models have demonstrated highly-expressive generative capabilities in vision and NLP. Recent studies in reinforcement learning (RL) have shown that diffusion models are also powerful in modeling complex policies or trajectories in offline datasets. However, these works have been limited to single-task settings where a generalist agent capable of addressing multi-task predicaments is absent. In this paper, we aim to investigate the effectiveness of a single diffusion model in modeling large-scale multi-task offline data, which can be challenging due to diverse and multimodal data distribution. Specifically, we propose Multi-Task Diffusion Model (\textsc{MTDiff}), a diffusion-based method that incorporates Transformer backbones and prompt learning for generative planning and data synthesis in multi-task offline settings. \textsc{MTDiff} leverages vast amounts of knowledge available in multi-task data and performs implicit knowledge sharing among tasks. For generative planning, we find \textsc{MTDiff} outperforms state-of-the-art algorithms across 50 tasks on Meta-World and 8 maps on Maze2D. For data synthesis, \textsc{MTDiff} generates high-quality data for testing tasks given a single demonstration as a prompt, which enhances the low-quality datasets for even unseen tasks.Comment: 21 page

arXiv.org e-Print Archive

Cross-Domain Policy Adaptation via Value-Guided Data Filtering

Author: Bai Chenjia
Li Wei
Li Xuelong
Ma Xiaoteng
Wang Dong
Wang Zhen
Xu Kang
Zhao Bin
Publication venue
Publication date: 28/05/2023
Field of study

Generalizing policies across different domains with dynamics mismatch poses a significant challenge in reinforcement learning. For example, a robot learns the policy in a simulator, but when it is deployed in the real world, the dynamics of the environment may be different. Given the source and target domain with dynamics mismatch, we consider the online dynamics adaptation problem, in which case the agent can access sufficient source domain data while online interactions with the target domain are limited. Existing research has attempted to solve the problem from the dynamics discrepancy perspective. In this work, we reveal the limitations of these methods and explore the problem from the value difference perspective via a novel insight on the value consistency across domains. Specifically, we present the Value-Guided Data Filtering (VGDF) algorithm, which selectively shares transitions from the source domain based on the proximity of paired value targets across the two domains. Empirical results on various environments with kinematic and morphology shifts demonstrate that our method achieves superior performance compared to prior approaches.Comment: 27 pages, 15 figure

arXiv.org e-Print Archive

Robust Quadrupedal Locomotion via Risk-Averse Policy Learning

Author: Bai Chenjia
Han Lei
He Haoran
Li Xiu
Li Xuelong
Shi Jiyuan
Wang Dong
Zhao Bin
Zhao Mingguo
Publication venue
Publication date: 01/09/2023
Field of study

The robustness of legged locomotion is crucial for quadrupedal robots in challenging terrains. Recently, Reinforcement Learning (RL) has shown promising results in legged locomotion and various methods try to integrate privileged distillation, scene modeling, and external sensors to improve the generalization and robustness of locomotion policies. However, these methods are hard to handle uncertain scenarios such as abrupt terrain changes or unexpected external forces. In this paper, we consider a novel risk-sensitive perspective to enhance the robustness of legged locomotion. Specifically, we employ a distributional value function learned by quantile regression to model the aleatoric uncertainty of environments, and perform risk-averse policy learning by optimizing the worst-case scenarios via a risk distortion measure. Extensive experiments in both simulation environments and a real Aliengo robot demonstrate that our method is efficient in handling various external disturbances, and the resulting policy exhibits improved robustness in harsh and uncertain situations in legged locomotion. Videos are available at https://risk-averse-locomotion.github.io/.Comment: 8 pages, 5 figure

arXiv.org e-Print Archive

Functional analysis of the structural domain of ARF proteins in rice (Oryza sativa L.)

Author: Abel
Abel
Chen
ChenJia Shen
Fukaki
Gattiker
Gomez Corredor
Guilfoyle
Guilfoyle
Guilfoyle
Hagen
Hamann
Hardtke
Harper
Jain
Kim
Li
Liscum
Mattsson
Ming Chen
Muto
Okushima
Ouellet
Ping Wu
Romanelli
SaiNa Zhang
Song
Song
SuiKang Wang
Szemenyei
Tatematsu
Thakur
Tiwari
Tiwari
Tiwari
Tom J. Guilfoyle
Uehara
Ulmasov
Ulmasov
Ulmasov
Ulmasov
Varagona
Wang
Wang
Weijers
Weijers
Wilmoth
Worley
YanHua Qi
YouHuang Bai
YunRong Wu
Publication venue: Oxford University Press
Publication date
Field of study

Auxin response factors (ARFs) are key regulators of plant growth and development. Through interaction with auxin/indole acetic acid (Aux/IAA) proteins, they influence the expression of auxin response genes. An ARF gene family has been predicted in rice, but the functions of the individual structural domains of the OsARFs remain obscure. Bioinformatics was used to analyse the position of the DNA-binding domain (DBD), middle region (MR), and C-terminal dimerization domain (CTD) of OsARFs, and experimentally confirmed the presence of a classical monopartite nuclear localization signal (NLS) in the DBD. The DBD was shown to contribute to nuclear localization of OsARF proteins in addition to its known DNA-binding function. Interactions between 14 integrated OsARFs and 15 OsIAA proteins were tested using yeast two-hybrid assays. It was found that eight OsARF activators interacted with the 15 OsIAA proteins, while six OsARF repressors did not. The interactions between the MR+CTD or CTD of 10 OsARFs and 15 OsIAA proteins were also tested and the results were consistent with those of each intact OsARF, although some slight differences in interaction intensity were observed by α-galactosidase quantitative assays. The truncated CTD of OsARF11 did not interact with any OsIAA, implying that the CTD is required for ARF–IAA dimerization, and that the MR influences the interaction intensity in yeast. A subset of the interactions in yeast were also observed in tobacco plants using firefly luciferase complementation imaging assays, indicating that these interactions are specific in plants, and might have a special role in the auxin signalling response. This study provides new insight into the structure of OsARF proteins and ARF–Aux/IAA interactions

Crossref

PubMed Central

RORL: Robust Offline Reinforcement Learning via Conservative Smoothing

Author: Bai Chenjia
Han Lei
Ma Xiaoteng
Wang Zhaoran
Yang Rui
Zhang Chongjie
Publication venue
Publication date: 06/06/2022
Field of study

Offline reinforcement learning (RL) provides a promising direction to exploit the massive amount of offline data for complex decision-making tasks. Due to the distribution shift issue, current offline RL algorithms are generally designed to be conservative for value estimation and action selection. However, such conservatism impairs the robustness of learned policies, leading to a significant change even for a small perturbation on observations. To trade off robustness and conservatism, we propose Robust Offline Reinforcement Learning (RORL) with a novel conservative smoothing technique. In RORL, we explicitly introduce regularization on the policy and the value function for states near the dataset and additional conservative value estimation on these OOD states. Theoretically, we show RORL enjoys a tighter suboptimality bound than recent theoretical results in linear MDPs. We demonstrate that RORL can achieve the state-of-the-art performance on the general offline RL benchmark and is considerably robust to adversarial observation perturbation.Comment: 23 pages, 10 figure

arXiv.org e-Print Archive

An End-to-end Auto-driving Method Based on 3D Lidar

Author: Audebert Nicolas
Bai Chenjia
Chao Chen
Han Dong
Hongfei Li
Hu Aiqin
Meng Wang
Pan Yixiao
Wang Jun
Wang Mengwei
Wei Shu
Wei Zhang
Yang Q
Yao Le
Yu Runsheng
Yuanzhi Lu
Publication venue: 'IOP Publishing'
Publication date
Field of study

Crossref

Table_5.DOCX

Author: Albrecht
Ansari
Ashikari
Bai
Batistiè
Bender
Bradley
Bradley
Cheng
Chenjia
Cuéllar
Drerup
Foyer
Gomez
González
Han
Hashimoto
Heng
Hu
Itoh
Ji
Kanwar
Kato
Kellogg
Kim
Kim
Kobayasi
Kolukisaoglu
Komatsu
Komatsu
Li
Li
Li
Liu
Liu
Lu
Luan
Luo
Ma
Ma
Ma
Mcsteen
Mongrand
Nakagawa
Ohta
Oikawa
Pandey
Piao
Ren
Sakamoto
Sakamoto
Senanayake
Song
Sánchez-Barrena
Tabuchi
Tan
Tang
Teng
Trapnell
Tripathi
Wang
Wang
Weinl
Xiang
Xu
Yadeta
Yamagishi
Zhang
Zhang
Zhang
Zhi
Zhu
Zhu
Publication venue: 'Frontiers Media SA'
Publication date
Field of study

Crossref