10 research outputs found
Privileged Knowledge Distillation for Sim-to-Real Policy Generalization
Reinforcement Learning (RL) has recently achieved remarkable success in
robotic control. However, most RL methods operate in simulated environments
where privileged knowledge (e.g., dynamics, surroundings, terrains) is readily
available. Conversely, in real-world scenarios, robot agents usually rely
solely on local states (e.g., proprioceptive feedback of robot joints) to
select actions, leading to a significant sim-to-real gap. Existing methods
address this gap by either gradually reducing the reliance on privileged
knowledge or performing a two-stage policy imitation. However, we argue that
these methods are limited in their ability to fully leverage the privileged
knowledge, resulting in suboptimal performance. In this paper, we propose a
novel single-stage privileged knowledge distillation method called the
Historical Information Bottleneck (HIB) to narrow the sim-to-real gap. In
particular, HIB learns a privileged knowledge representation from historical
trajectories by capturing the underlying changeable dynamic information.
Theoretical analysis shows that the learned privileged knowledge representation
helps reduce the value discrepancy between the oracle and learned policies.
Empirical experiments on both simulated and real-world tasks demonstrate that
HIB yields improved generalizability compared to previous methods.Comment: 22 page
Diffusion Model is an Effective Planner and Data Synthesizer for Multi-Task Reinforcement Learning
Diffusion models have demonstrated highly-expressive generative capabilities
in vision and NLP. Recent studies in reinforcement learning (RL) have shown
that diffusion models are also powerful in modeling complex policies or
trajectories in offline datasets. However, these works have been limited to
single-task settings where a generalist agent capable of addressing multi-task
predicaments is absent. In this paper, we aim to investigate the effectiveness
of a single diffusion model in modeling large-scale multi-task offline data,
which can be challenging due to diverse and multimodal data distribution.
Specifically, we propose Multi-Task Diffusion Model (\textsc{MTDiff}), a
diffusion-based method that incorporates Transformer backbones and prompt
learning for generative planning and data synthesis in multi-task offline
settings. \textsc{MTDiff} leverages vast amounts of knowledge available in
multi-task data and performs implicit knowledge sharing among tasks. For
generative planning, we find \textsc{MTDiff} outperforms state-of-the-art
algorithms across 50 tasks on Meta-World and 8 maps on Maze2D. For data
synthesis, \textsc{MTDiff} generates high-quality data for testing tasks given
a single demonstration as a prompt, which enhances the low-quality datasets for
even unseen tasks.Comment: 21 page
Cross-Domain Policy Adaptation via Value-Guided Data Filtering
Generalizing policies across different domains with dynamics mismatch poses a
significant challenge in reinforcement learning. For example, a robot learns
the policy in a simulator, but when it is deployed in the real world, the
dynamics of the environment may be different. Given the source and target
domain with dynamics mismatch, we consider the online dynamics adaptation
problem, in which case the agent can access sufficient source domain data while
online interactions with the target domain are limited. Existing research has
attempted to solve the problem from the dynamics discrepancy perspective. In
this work, we reveal the limitations of these methods and explore the problem
from the value difference perspective via a novel insight on the value
consistency across domains. Specifically, we present the Value-Guided Data
Filtering (VGDF) algorithm, which selectively shares transitions from the
source domain based on the proximity of paired value targets across the two
domains. Empirical results on various environments with kinematic and
morphology shifts demonstrate that our method achieves superior performance
compared to prior approaches.Comment: 27 pages, 15 figure
Robust Quadrupedal Locomotion via Risk-Averse Policy Learning
The robustness of legged locomotion is crucial for quadrupedal robots in
challenging terrains. Recently, Reinforcement Learning (RL) has shown promising
results in legged locomotion and various methods try to integrate privileged
distillation, scene modeling, and external sensors to improve the
generalization and robustness of locomotion policies. However, these methods
are hard to handle uncertain scenarios such as abrupt terrain changes or
unexpected external forces. In this paper, we consider a novel risk-sensitive
perspective to enhance the robustness of legged locomotion. Specifically, we
employ a distributional value function learned by quantile regression to model
the aleatoric uncertainty of environments, and perform risk-averse policy
learning by optimizing the worst-case scenarios via a risk distortion measure.
Extensive experiments in both simulation environments and a real Aliengo robot
demonstrate that our method is efficient in handling various external
disturbances, and the resulting policy exhibits improved robustness in harsh
and uncertain situations in legged locomotion. Videos are available at
https://risk-averse-locomotion.github.io/.Comment: 8 pages, 5 figure
Functional analysis of the structural domain of ARF proteins in rice (Oryza sativa L.)
Auxin response factors (ARFs) are key regulators of plant growth and development. Through interaction with auxin/indole acetic acid (Aux/IAA) proteins, they influence the expression of auxin response genes. An ARF gene family has been predicted in rice, but the functions of the individual structural domains of the OsARFs remain obscure. Bioinformatics was used to analyse the position of the DNA-binding domain (DBD), middle region (MR), and C-terminal dimerization domain (CTD) of OsARFs, and experimentally confirmed the presence of a classical monopartite nuclear localization signal (NLS) in the DBD. The DBD was shown to contribute to nuclear localization of OsARF proteins in addition to its known DNA-binding function. Interactions between 14 integrated OsARFs and 15 OsIAA proteins were tested using yeast two-hybrid assays. It was found that eight OsARF activators interacted with the 15 OsIAA proteins, while six OsARF repressors did not. The interactions between the MR+CTD or CTD of 10 OsARFs and 15 OsIAA proteins were also tested and the results were consistent with those of each intact OsARF, although some slight differences in interaction intensity were observed by α-galactosidase quantitative assays. The truncated CTD of OsARF11 did not interact with any OsIAA, implying that the CTD is required for ARF–IAA dimerization, and that the MR influences the interaction intensity in yeast. A subset of the interactions in yeast were also observed in tobacco plants using firefly luciferase complementation imaging assays, indicating that these interactions are specific in plants, and might have a special role in the auxin signalling response. This study provides new insight into the structure of OsARF proteins and ARF–Aux/IAA interactions
RORL: Robust Offline Reinforcement Learning via Conservative Smoothing
Offline reinforcement learning (RL) provides a promising direction to exploit
the massive amount of offline data for complex decision-making tasks. Due to
the distribution shift issue, current offline RL algorithms are generally
designed to be conservative for value estimation and action selection. However,
such conservatism impairs the robustness of learned policies, leading to a
significant change even for a small perturbation on observations. To trade off
robustness and conservatism, we propose Robust Offline Reinforcement Learning
(RORL) with a novel conservative smoothing technique. In RORL, we explicitly
introduce regularization on the policy and the value function for states near
the dataset and additional conservative value estimation on these OOD states.
Theoretically, we show RORL enjoys a tighter suboptimality bound than recent
theoretical results in linear MDPs. We demonstrate that RORL can achieve the
state-of-the-art performance on the general offline RL benchmark and is
considerably robust to adversarial observation perturbation.Comment: 23 pages, 10 figure