10 research outputs found

    Privileged Knowledge Distillation for Sim-to-Real Policy Generalization

    Full text link
    Reinforcement Learning (RL) has recently achieved remarkable success in robotic control. However, most RL methods operate in simulated environments where privileged knowledge (e.g., dynamics, surroundings, terrains) is readily available. Conversely, in real-world scenarios, robot agents usually rely solely on local states (e.g., proprioceptive feedback of robot joints) to select actions, leading to a significant sim-to-real gap. Existing methods address this gap by either gradually reducing the reliance on privileged knowledge or performing a two-stage policy imitation. However, we argue that these methods are limited in their ability to fully leverage the privileged knowledge, resulting in suboptimal performance. In this paper, we propose a novel single-stage privileged knowledge distillation method called the Historical Information Bottleneck (HIB) to narrow the sim-to-real gap. In particular, HIB learns a privileged knowledge representation from historical trajectories by capturing the underlying changeable dynamic information. Theoretical analysis shows that the learned privileged knowledge representation helps reduce the value discrepancy between the oracle and learned policies. Empirical experiments on both simulated and real-world tasks demonstrate that HIB yields improved generalizability compared to previous methods.Comment: 22 page

    Diffusion Model is an Effective Planner and Data Synthesizer for Multi-Task Reinforcement Learning

    Full text link
    Diffusion models have demonstrated highly-expressive generative capabilities in vision and NLP. Recent studies in reinforcement learning (RL) have shown that diffusion models are also powerful in modeling complex policies or trajectories in offline datasets. However, these works have been limited to single-task settings where a generalist agent capable of addressing multi-task predicaments is absent. In this paper, we aim to investigate the effectiveness of a single diffusion model in modeling large-scale multi-task offline data, which can be challenging due to diverse and multimodal data distribution. Specifically, we propose Multi-Task Diffusion Model (\textsc{MTDiff}), a diffusion-based method that incorporates Transformer backbones and prompt learning for generative planning and data synthesis in multi-task offline settings. \textsc{MTDiff} leverages vast amounts of knowledge available in multi-task data and performs implicit knowledge sharing among tasks. For generative planning, we find \textsc{MTDiff} outperforms state-of-the-art algorithms across 50 tasks on Meta-World and 8 maps on Maze2D. For data synthesis, \textsc{MTDiff} generates high-quality data for testing tasks given a single demonstration as a prompt, which enhances the low-quality datasets for even unseen tasks.Comment: 21 page

    Cross-Domain Policy Adaptation via Value-Guided Data Filtering

    Full text link
    Generalizing policies across different domains with dynamics mismatch poses a significant challenge in reinforcement learning. For example, a robot learns the policy in a simulator, but when it is deployed in the real world, the dynamics of the environment may be different. Given the source and target domain with dynamics mismatch, we consider the online dynamics adaptation problem, in which case the agent can access sufficient source domain data while online interactions with the target domain are limited. Existing research has attempted to solve the problem from the dynamics discrepancy perspective. In this work, we reveal the limitations of these methods and explore the problem from the value difference perspective via a novel insight on the value consistency across domains. Specifically, we present the Value-Guided Data Filtering (VGDF) algorithm, which selectively shares transitions from the source domain based on the proximity of paired value targets across the two domains. Empirical results on various environments with kinematic and morphology shifts demonstrate that our method achieves superior performance compared to prior approaches.Comment: 27 pages, 15 figure

    Robust Quadrupedal Locomotion via Risk-Averse Policy Learning

    Full text link
    The robustness of legged locomotion is crucial for quadrupedal robots in challenging terrains. Recently, Reinforcement Learning (RL) has shown promising results in legged locomotion and various methods try to integrate privileged distillation, scene modeling, and external sensors to improve the generalization and robustness of locomotion policies. However, these methods are hard to handle uncertain scenarios such as abrupt terrain changes or unexpected external forces. In this paper, we consider a novel risk-sensitive perspective to enhance the robustness of legged locomotion. Specifically, we employ a distributional value function learned by quantile regression to model the aleatoric uncertainty of environments, and perform risk-averse policy learning by optimizing the worst-case scenarios via a risk distortion measure. Extensive experiments in both simulation environments and a real Aliengo robot demonstrate that our method is efficient in handling various external disturbances, and the resulting policy exhibits improved robustness in harsh and uncertain situations in legged locomotion. Videos are available at https://risk-averse-locomotion.github.io/.Comment: 8 pages, 5 figure

    Functional analysis of the structural domain of ARF proteins in rice (Oryza sativa L.)

    Get PDF
    Auxin response factors (ARFs) are key regulators of plant growth and development. Through interaction with auxin/indole acetic acid (Aux/IAA) proteins, they influence the expression of auxin response genes. An ARF gene family has been predicted in rice, but the functions of the individual structural domains of the OsARFs remain obscure. Bioinformatics was used to analyse the position of the DNA-binding domain (DBD), middle region (MR), and C-terminal dimerization domain (CTD) of OsARFs, and experimentally confirmed the presence of a classical monopartite nuclear localization signal (NLS) in the DBD. The DBD was shown to contribute to nuclear localization of OsARF proteins in addition to its known DNA-binding function. Interactions between 14 integrated OsARFs and 15 OsIAA proteins were tested using yeast two-hybrid assays. It was found that eight OsARF activators interacted with the 15 OsIAA proteins, while six OsARF repressors did not. The interactions between the MR+CTD or CTD of 10 OsARFs and 15 OsIAA proteins were also tested and the results were consistent with those of each intact OsARF, although some slight differences in interaction intensity were observed by α-galactosidase quantitative assays. The truncated CTD of OsARF11 did not interact with any OsIAA, implying that the CTD is required for ARF–IAA dimerization, and that the MR influences the interaction intensity in yeast. A subset of the interactions in yeast were also observed in tobacco plants using firefly luciferase complementation imaging assays, indicating that these interactions are specific in plants, and might have a special role in the auxin signalling response. This study provides new insight into the structure of OsARF proteins and ARF–Aux/IAA interactions

    RORL: Robust Offline Reinforcement Learning via Conservative Smoothing

    Full text link
    Offline reinforcement learning (RL) provides a promising direction to exploit the massive amount of offline data for complex decision-making tasks. Due to the distribution shift issue, current offline RL algorithms are generally designed to be conservative for value estimation and action selection. However, such conservatism impairs the robustness of learned policies, leading to a significant change even for a small perturbation on observations. To trade off robustness and conservatism, we propose Robust Offline Reinforcement Learning (RORL) with a novel conservative smoothing technique. In RORL, we explicitly introduce regularization on the policy and the value function for states near the dataset and additional conservative value estimation on these OOD states. Theoretically, we show RORL enjoys a tighter suboptimality bound than recent theoretical results in linear MDPs. We demonstrate that RORL can achieve the state-of-the-art performance on the general offline RL benchmark and is considerably robust to adversarial observation perturbation.Comment: 23 pages, 10 figure
    corecore