13 research outputs found
Unleashing the Power of Pre-trained Language Models for Offline Reinforcement Learning
Offline reinforcement learning (RL) aims to find a near-optimal policy using
pre-collected datasets. In real-world scenarios, data collection could be
costly and risky; therefore, offline RL becomes particularly challenging when
the in-domain data is limited. Given recent advances in Large Language Models
(LLMs) and their few-shot learning prowess, this paper introduces
nguage Models for tion Control (), a
general framework based on Decision Transformers to effectively use pre-trained
Language Models (LMs) for offline RL. Our framework highlights four crucial
components: (1) Initializing Decision Transformers with sequentially
pre-trained LMs, (2) employing the LoRA fine-tuning method, in contrast to
full-weight fine-tuning, to combine the pre-trained knowledge from LMs and
in-domain knowledge effectively, (3) using the non-linear MLP transformation
instead of linear projections, to generate embeddings, and (4) integrating an
auxiliary language prediction loss during fine-tuning to stabilize the LMs and
retain their original abilities on languages. Empirical results indicate
achieves state-of-the-art performance in sparse-reward tasks
and closes the gap between value-based offline RL methods and decision
transformers in dense-reward tasks. In particular, our method demonstrates
superior performance in scenarios with limited data samples.Comment: 24 pages, 16 table
H-InDex: Visual Reinforcement Learning with Hand-Informed Representations for Dexterous Manipulation
Human hands possess remarkable dexterity and have long served as a source of
inspiration for robotic manipulation. In this work, we propose a human
andformed visual representation learning framework to
solve difficult terous manipulation tasks ()
with reinforcement learning. Our framework consists of three stages: (i)
pre-training representations with 3D human hand pose estimation, (ii) offline
adapting representations with self-supervised keypoint detection, and (iii)
reinforcement learning with exponential moving average BatchNorm. The last two
stages only modify parameters of the pre-trained representation in
total, ensuring the knowledge from pre-training is maintained to the full
extent. We empirically study 12 challenging dexterous manipulation tasks and
find that H-InDex largely surpasses strong baseline methods and the recent
visual foundation models for motor control. Code is available at
https://yanjieze.com/H-InDex .Comment: NeurIPS 2023. Code and videos: https://yanjieze.com/H-InDe
GNFactor: Multi-Task Real Robot Learning with Generalizable Neural Feature Fields
It is a long-standing problem in robotics to develop agents capable of
executing diverse manipulation tasks from visual observations in unstructured
real-world environments. To achieve this goal, the robot needs to have a
comprehensive understanding of the 3D structure and semantics of the scene. In
this work, we present , a visual behavior cloning agent for
multi-task robotic manipulation with eneralizable eural
feature ields. GNFactor jointly optimizes a generalizable neural
field (GNF) as a reconstruction module and a Perceiver Transformer as a
decision-making module, leveraging a shared deep 3D voxel representation. To
incorporate semantics in 3D, the reconstruction module utilizes a
vision-language foundation model (, Stable Diffusion) to distill
rich semantic information into the deep 3D voxel. We evaluate GNFactor on 3
real robot tasks and perform detailed ablations on 10 RLBench tasks with a
limited number of demonstrations. We observe a substantial improvement of
GNFactor over current state-of-the-art methods in seen and unseen tasks,
demonstrating the strong generalization ability of GNFactor. Our project
website is https://yanjieze.com/GNFactor/ .Comment: CoRL 2023 Oral. Website: https://yanjieze.com/GNFactor
DrM: Mastering Visual Reinforcement Learning through Dormant Ratio Minimization
Visual reinforcement learning (RL) has shown promise in continuous control
tasks. Despite its progress, current algorithms are still unsatisfactory in
virtually every aspect of the performance such as sample efficiency, asymptotic
performance, and their robustness to the choice of random seeds. In this paper,
we identify a major shortcoming in existing visual RL methods that is the
agents often exhibit sustained inactivity during early training, thereby
limiting their ability to explore effectively. Expanding upon this crucial
observation, we additionally unveil a significant correlation between the
agents' inclination towards motorically inactive exploration and the absence of
neuronal activity within their policy networks. To quantify this inactivity, we
adopt dormant ratio as a metric to measure inactivity in the RL agent's
network. Empirically, we also recognize that the dormant ratio can act as a
standalone indicator of an agent's activity level, regardless of the received
reward signals. Leveraging the aforementioned insights, we introduce DrM, a
method that uses three core mechanisms to guide agents'
exploration-exploitation trade-offs by actively minimizing the dormant ratio.
Experiments demonstrate that DrM achieves significant improvements in sample
efficiency and asymptotic performance with no broken seeds (76 seeds in total)
across three continuous control benchmark environments, including DeepMind
Control Suite, MetaWorld, and Adroit. Most importantly, DrM is the first
model-free algorithm that consistently solves tasks in both the Dog and
Manipulator domains from the DeepMind Control Suite as well as three dexterous
hand manipulation tasks without demonstrations in Adroit, all based on pixel
observations
Prediction Model of Pumpkin Rootstock Seedlings Based on Temperature and Light Responses
Temperature and light are the key factors that affect the quality of pumpkin rootstock seedlings’ growth process. Responses to temperature and light are an important basis for optimizing the greenhouse environment. In order to determine the quantitative effects of temperature and light on the growth and development of pumpkin (Cucurbita moschata cv. RTWM6018) rootstock seedlings, relationships between temperature, light, and pumpkin rootstock seedlings growth were established using regression analysis. The results indicated that the daily average temperature had a significant negative correlation with the development time of pumpkin rootstock seedlings, and the shoot dry weight of pumpkin rootstock seedlings increased within a certain range of the daily light integral (DLI). We established a prediction model of pumpkin rootstock seedling quality indicators (hypocotyl length, stem diameter, shoot dry weight, root dry weight, root shoot ratio, and seedling quality index) based on thermal effectiveness and photosynthetic photon flux density (TEP). The coefficient of determinations (R2) of the hypocotyl length and seedling quality index prediction models of pumpkin rootstock seedlings, based on accumulated TEP, were 0.707 and 0.834, respectively. The hypocotyl length and seedling quality index prediction models of pumpkin rootstock seedlings, based on accumulated TEP, were y1 = 0.001 x2 − 0.180 x + 13.057 and y2 = 0.008 x0.722, respectively, which could be used for predicting the growth of pumpkin rootstock seedlings grown under different temperature and light conditions
Transcription Factors Pmr1 and Pmr2 Cooperatively Regulate Melanin Biosynthesis, Conidia Development and Secondary Metabolism in Pestalotiopsis microspora
Melanins are the common fungal pigment, which contribute to stress resistance and pathogenesis. However, few studies have explored the regulation mechanism of its synthesis in filamentous fungi. In this study, we identified two transcription factors, Pmr1 and Pmr2, in the filamentous fungus Pestalotiopsis microspora. Computational and phylogenetic analyses revealed that Pmr1 and Pmr2 were located in the gene cluster for melanin biosynthesis. The targeted deletion mutant strain Δpmr1 displayed defects in biosynthesis of conidia pigment and morphological integrity. The deletion of pmr2 resulted in reduced conidia pigment, but the mycelial morphology had little change. Moreover, Δpmr2 produced decreased conidia. RT-qPCR data revealed that expression levels of genes in the melanin biosynthesis gene cluster were downregulated from the loss of Pmr1 and Pmr2. Interestingly, the yield of secondary metabolites in the mutant strains Δpmr1 and Δpmr2 increased, comparing with the wild type, and additionally, Pmr1 played a larger regulatory role in secondary metabolism. Taken together, our results revealed the crucial roles of the transcription factors Pmr1 and Pmr2 in melanin synthesis, asexual development and secondary metabolism in the filamentous fungus P. microspora
Zoning of Ecological Restoration in the Qilian Mountain Area, China
Ecosystem restoration has been widely concerned with the damage and degradation of ecosystems worldwide. Scientific and reasonable formulations of ecological restoration zoning is the basis for the formulation of an ecological restoration plan. In this study, a restoration zoning index system was proposed to comprehensively consider the ecological problems of ecosystems. The linear weighted function method was used to construct the ecological restoration index (ERI) as an important index of zoning. The research showed that: (1) the ecological restoration zones of the Qilian Mountains can be divided into eight basins, namely the headwaters of the Datong River Basin, the Danghe-Dahaerteng River Basin, the northern confluence area of the Qinghai Lake, the upper Shule River to middle Heihe River, the Oasis Agricultural Area in the northern foothills of the Qilian Mountain, the Huangshui Basin Valley, Aksay (corridor region of the western Hexi Basin), and the northeastern Tsaidam Basin; (2) the restoration index of the eight ecological restoration zones of the Qilian Mountains was between 0.34–0.8, with an average of 0.61 (the smaller the index, the more prominent the comprehensive ecological problem representing the regional mountains, rivers, forests, cultivated lands, lakes, and grasslands, and thus the greater the need to implement comprehensive ecological protection and restoration projects); and (3) the ecological problems of different ecological zones are frequently numerous, and often show the phenomenon of multiple overlapping ecological problems in the same zone