13 research outputs found

    Unleashing the Power of Pre-trained Language Models for Offline Reinforcement Learning

    Full text link
    Offline reinforcement learning (RL) aims to find a near-optimal policy using pre-collected datasets. In real-world scenarios, data collection could be costly and risky; therefore, offline RL becomes particularly challenging when the in-domain data is limited. Given recent advances in Large Language Models (LLMs) and their few-shot learning prowess, this paper introduces La\textbf{La}nguage Models for Mo\textbf{Mo}tion Control (LaMo\textbf{LaMo}), a general framework based on Decision Transformers to effectively use pre-trained Language Models (LMs) for offline RL. Our framework highlights four crucial components: (1) Initializing Decision Transformers with sequentially pre-trained LMs, (2) employing the LoRA fine-tuning method, in contrast to full-weight fine-tuning, to combine the pre-trained knowledge from LMs and in-domain knowledge effectively, (3) using the non-linear MLP transformation instead of linear projections, to generate embeddings, and (4) integrating an auxiliary language prediction loss during fine-tuning to stabilize the LMs and retain their original abilities on languages. Empirical results indicate LaMo\textbf{LaMo} achieves state-of-the-art performance in sparse-reward tasks and closes the gap between value-based offline RL methods and decision transformers in dense-reward tasks. In particular, our method demonstrates superior performance in scenarios with limited data samples.Comment: 24 pages, 16 table

    H-InDex: Visual Reinforcement Learning with Hand-Informed Representations for Dexterous Manipulation

    Full text link
    Human hands possess remarkable dexterity and have long served as a source of inspiration for robotic manipulation. In this work, we propose a human H\textbf{H}and-In\textbf{-In}formed visual representation learning framework to solve difficult Dex\textbf{Dex}terous manipulation tasks (H-InDex\textbf{H-InDex}) with reinforcement learning. Our framework consists of three stages: (i) pre-training representations with 3D human hand pose estimation, (ii) offline adapting representations with self-supervised keypoint detection, and (iii) reinforcement learning with exponential moving average BatchNorm. The last two stages only modify 0.36%0.36\% parameters of the pre-trained representation in total, ensuring the knowledge from pre-training is maintained to the full extent. We empirically study 12 challenging dexterous manipulation tasks and find that H-InDex largely surpasses strong baseline methods and the recent visual foundation models for motor control. Code is available at https://yanjieze.com/H-InDex .Comment: NeurIPS 2023. Code and videos: https://yanjieze.com/H-InDe

    GNFactor: Multi-Task Real Robot Learning with Generalizable Neural Feature Fields

    Full text link
    It is a long-standing problem in robotics to develop agents capable of executing diverse manipulation tasks from visual observations in unstructured real-world environments. To achieve this goal, the robot needs to have a comprehensive understanding of the 3D structure and semantics of the scene. In this work, we present GNFactor\textbf{GNFactor}, a visual behavior cloning agent for multi-task robotic manipulation with G\textbf{G}eneralizable N\textbf{N}eural feature F\textbf{F}ields. GNFactor jointly optimizes a generalizable neural field (GNF) as a reconstruction module and a Perceiver Transformer as a decision-making module, leveraging a shared deep 3D voxel representation. To incorporate semantics in 3D, the reconstruction module utilizes a vision-language foundation model (e.g.\textit{e.g.}, Stable Diffusion) to distill rich semantic information into the deep 3D voxel. We evaluate GNFactor on 3 real robot tasks and perform detailed ablations on 10 RLBench tasks with a limited number of demonstrations. We observe a substantial improvement of GNFactor over current state-of-the-art methods in seen and unseen tasks, demonstrating the strong generalization ability of GNFactor. Our project website is https://yanjieze.com/GNFactor/ .Comment: CoRL 2023 Oral. Website: https://yanjieze.com/GNFactor

    DrM: Mastering Visual Reinforcement Learning through Dormant Ratio Minimization

    Full text link
    Visual reinforcement learning (RL) has shown promise in continuous control tasks. Despite its progress, current algorithms are still unsatisfactory in virtually every aspect of the performance such as sample efficiency, asymptotic performance, and their robustness to the choice of random seeds. In this paper, we identify a major shortcoming in existing visual RL methods that is the agents often exhibit sustained inactivity during early training, thereby limiting their ability to explore effectively. Expanding upon this crucial observation, we additionally unveil a significant correlation between the agents' inclination towards motorically inactive exploration and the absence of neuronal activity within their policy networks. To quantify this inactivity, we adopt dormant ratio as a metric to measure inactivity in the RL agent's network. Empirically, we also recognize that the dormant ratio can act as a standalone indicator of an agent's activity level, regardless of the received reward signals. Leveraging the aforementioned insights, we introduce DrM, a method that uses three core mechanisms to guide agents' exploration-exploitation trade-offs by actively minimizing the dormant ratio. Experiments demonstrate that DrM achieves significant improvements in sample efficiency and asymptotic performance with no broken seeds (76 seeds in total) across three continuous control benchmark environments, including DeepMind Control Suite, MetaWorld, and Adroit. Most importantly, DrM is the first model-free algorithm that consistently solves tasks in both the Dog and Manipulator domains from the DeepMind Control Suite as well as three dexterous hand manipulation tasks without demonstrations in Adroit, all based on pixel observations

    Prediction Model of Pumpkin Rootstock Seedlings Based on Temperature and Light Responses

    No full text
    Temperature and light are the key factors that affect the quality of pumpkin rootstock seedlings’ growth process. Responses to temperature and light are an important basis for optimizing the greenhouse environment. In order to determine the quantitative effects of temperature and light on the growth and development of pumpkin (Cucurbita moschata cv. RTWM6018) rootstock seedlings, relationships between temperature, light, and pumpkin rootstock seedlings growth were established using regression analysis. The results indicated that the daily average temperature had a significant negative correlation with the development time of pumpkin rootstock seedlings, and the shoot dry weight of pumpkin rootstock seedlings increased within a certain range of the daily light integral (DLI). We established a prediction model of pumpkin rootstock seedling quality indicators (hypocotyl length, stem diameter, shoot dry weight, root dry weight, root shoot ratio, and seedling quality index) based on thermal effectiveness and photosynthetic photon flux density (TEP). The coefficient of determinations (R2) of the hypocotyl length and seedling quality index prediction models of pumpkin rootstock seedlings, based on accumulated TEP, were 0.707 and 0.834, respectively. The hypocotyl length and seedling quality index prediction models of pumpkin rootstock seedlings, based on accumulated TEP, were y1 = 0.001 x2 − 0.180 x + 13.057 and y2 = 0.008 x0.722, respectively, which could be used for predicting the growth of pumpkin rootstock seedlings grown under different temperature and light conditions

    Transcription Factors Pmr1 and Pmr2 Cooperatively Regulate Melanin Biosynthesis, Conidia Development and Secondary Metabolism in Pestalotiopsis microspora

    No full text
    Melanins are the common fungal pigment, which contribute to stress resistance and pathogenesis. However, few studies have explored the regulation mechanism of its synthesis in filamentous fungi. In this study, we identified two transcription factors, Pmr1 and Pmr2, in the filamentous fungus Pestalotiopsis microspora. Computational and phylogenetic analyses revealed that Pmr1 and Pmr2 were located in the gene cluster for melanin biosynthesis. The targeted deletion mutant strain Δpmr1 displayed defects in biosynthesis of conidia pigment and morphological integrity. The deletion of pmr2 resulted in reduced conidia pigment, but the mycelial morphology had little change. Moreover, Δpmr2 produced decreased conidia. RT-qPCR data revealed that expression levels of genes in the melanin biosynthesis gene cluster were downregulated from the loss of Pmr1 and Pmr2. Interestingly, the yield of secondary metabolites in the mutant strains Δpmr1 and Δpmr2 increased, comparing with the wild type, and additionally, Pmr1 played a larger regulatory role in secondary metabolism. Taken together, our results revealed the crucial roles of the transcription factors Pmr1 and Pmr2 in melanin synthesis, asexual development and secondary metabolism in the filamentous fungus P. microspora

    Zoning of Ecological Restoration in the Qilian Mountain Area, China

    No full text
    Ecosystem restoration has been widely concerned with the damage and degradation of ecosystems worldwide. Scientific and reasonable formulations of ecological restoration zoning is the basis for the formulation of an ecological restoration plan. In this study, a restoration zoning index system was proposed to comprehensively consider the ecological problems of ecosystems. The linear weighted function method was used to construct the ecological restoration index (ERI) as an important index of zoning. The research showed that: (1) the ecological restoration zones of the Qilian Mountains can be divided into eight basins, namely the headwaters of the Datong River Basin, the Danghe-Dahaerteng River Basin, the northern confluence area of the Qinghai Lake, the upper Shule River to middle Heihe River, the Oasis Agricultural Area in the northern foothills of the Qilian Mountain, the Huangshui Basin Valley, Aksay (corridor region of the western Hexi Basin), and the northeastern Tsaidam Basin; (2) the restoration index of the eight ecological restoration zones of the Qilian Mountains was between 0.34–0.8, with an average of 0.61 (the smaller the index, the more prominent the comprehensive ecological problem representing the regional mountains, rivers, forests, cultivated lands, lakes, and grasslands, and thus the greater the need to implement comprehensive ecological protection and restoration projects); and (3) the ecological problems of different ecological zones are frequently numerous, and often show the phenomenon of multiple overlapping ecological problems in the same zone
    corecore