46 research outputs found

    Meta-Reinforcement Learning via Language Instructions

    Full text link
    Although deep reinforcement learning has recently been very successful at learning complex behaviors, it requires a tremendous amount of data to learn a task. One of the fundamental reasons causing this limitation lies in the nature of the trial-and-error learning paradigm of reinforcement learning, where the agent communicates with the environment and progresses in the learning only relying on the reward signal. This is implicit and rather insufficient to learn a task well. On the contrary, humans are usually taught new skills via natural language instructions. Utilizing language instructions for robotic motion control to improve the adaptability is a recently emerged topic and challenging. In this paper, we present a meta-RL algorithm that addresses the challenge of learning skills with language instructions in multiple manipulation tasks. On the one hand, our algorithm utilizes the language instructions to shape its interpretation of the task, on the other hand, it still learns to solve task in a trial-and-error process. We evaluate our algorithm on the robotic manipulation benchmark (Meta-World) and it significantly outperforms state-of-the-art methods in terms of training and testing task success rates. Codes are available at \url{https://tumi6robot.wixsite.com/million}

    Language-Conditioned Imitation Learning with Base Skill Priors under Unstructured Data

    Full text link
    The growing interest in language-conditioned robot manipulation aims to develop robots capable of understanding and executing complex tasks, with the objective of enabling robots to interpret language commands and manipulate objects accordingly. While language-conditioned approaches demonstrate impressive capabilities for addressing tasks in familiar environments, they encounter limitations in adapting to unfamiliar environment settings. In this study, we propose a general-purpose, language-conditioned approach that combines base skill priors and imitation learning under unstructured data to enhance the algorithm's generalization in adapting to unfamiliar environments. We assess our model's performance in both simulated and real-world environments using a zero-shot setting. In the simulated environment, the proposed approach surpasses previously reported scores for CALVIN benchmark, especially in the challenging Zero-Shot Multi-Environment setting. The average completed task length, indicating the average number of tasks the agent can continuously complete, improves more than 2.5 times compared to the state-of-the-art method HULC. In addition, we conduct a zero-shot evaluation of our policy in a real-world setting, following training exclusively in simulated environments without additional specific adaptations. In this evaluation, we set up ten tasks and achieved an average 30% improvement in our approach compared to the current state-of-the-art approach, demonstrating a high generalization capability in both simulated environments and the real world. For further details, including access to our code and videos, please refer to our supplementary materials

    Learning from Symmetry: Meta-Reinforcement Learning with Symmetric Data and Language Instructions

    Full text link
    Meta-reinforcement learning (meta-RL) is a promising approach that enables the agent to learn new tasks quickly. However, most meta-RL algorithms show poor generalization in multiple-task scenarios due to the insufficient task information provided only by rewards. Language-conditioned meta-RL improves the generalization by matching language instructions and the agent's behaviors. Learning from symmetry is an important form of human learning, therefore, combining symmetry and language instructions into meta-RL can help improve the algorithm's generalization and learning efficiency. We thus propose a dual-MDP meta-reinforcement learning method that enables learning new tasks efficiently with symmetric data and language instructions. We evaluate our method in multiple challenging manipulation tasks, and experimental results show our method can greatly improve the generalization and efficiency of meta-reinforcement learning

    Language-conditioned Learning for Robotic Manipulation: A Survey

    Full text link
    Language-conditioned robotic manipulation represents a cutting-edge area of research, enabling seamless communication and cooperation between humans and robotic agents. This field focuses on teaching robotic systems to comprehend and execute instructions conveyed in natural language. To achieve this, the development of robust language understanding models capable of extracting actionable insights from textual input is essential. In this comprehensive survey, we systematically explore recent advancements in language-conditioned approaches within the context of robotic manipulation. We analyze these approaches based on their learning paradigms, which encompass reinforcement learning, imitation learning, and the integration of foundational models, such as large language models and vision-language models. Furthermore, we conduct an in-depth comparative analysis, considering aspects like semantic information extraction, environment & evaluation, auxiliary tasks, and task representation. Finally, we outline potential future research directions in the realm of language-conditioned learning for robotic manipulation, with the topic of generalization capabilities and safety issues. The GitHub repository of this paper can be found at https://github.com/hk-zh/language-conditioned-robot-manipulation-model

    Safety Guaranteed Manipulation Based on Reinforcement Learning Planner and Model Predictive Control Actor

    Full text link
    Deep reinforcement learning (RL) has been endowed with high expectations in tackling challenging manipulation tasks in an autonomous and self-directed fashion. Despite the significant strides made in the development of reinforcement learning, the practical deployment of this paradigm is hindered by at least two barriers, namely, the engineering of a reward function and ensuring the safety guaranty of learning-based controllers. In this paper, we address these challenging limitations by proposing a framework that merges a reinforcement learning \lstinline[columns=fixed]{planner} that is trained using sparse rewards with a model predictive controller (MPC) \lstinline[columns=fixed]{actor}, thereby offering a safe policy. On the one hand, the RL \lstinline[columns=fixed]{planner} learns from sparse rewards by selecting intermediate goals that are easy to achieve in the short term and promising to lead to target goals in the long term. On the other hand, the MPC \lstinline[columns=fixed]{actor} takes the suggested intermediate goals from the RL \lstinline[columns=fixed]{planner} as the input and predicts how the robot's action will enable it to reach that goal while avoiding any obstacles over a short period of time. We evaluated our method on four challenging manipulation tasks with dynamic obstacles and the results demonstrate that, by leveraging the complementary strengths of these two components, the agent can solve manipulation tasks in complex, dynamic environments safely with a 100%100\% success rate. Videos are available at \url{https://videoviewsite.wixsite.com/mpc-hgg}

    Onset of sedimentation near the Carnian/Norian boundary in the northwestern Sichuan Basin: New evidence from ammonoid biostratigraphy and zircon U Pb geochronology

    Get PDF
    Upper Triassic deposits formed at the onset of subsidence in the Sichuan foreland basin of South China, and may record a crisis of carbonate deposition related to the Carnian Pluvial Episode. However, there is no consensus yet on the precise age of these deposits in northwestern Sichuan. In this work, ammonoid biostratigraphy has been improved, and a U/Pb age from detrital zircons has been obtained from the Upper Triassic of northwestern Sichuan. New ammonoid taxa Sinotropites sichuanensis n. gen., n. sp. and Hadrothisbites hanwangensis n. sp. are described from the upper part of the Ma’antang Formation in Hanwang and Jushui area and are assigned to the uppermost Tuvalian (Anatropites spinosus Zone, Gonionotites italicus Subzone). Ammonoid and conodont biostratigraphy, combined with U/Pb concordant ages of 227.2 ± 1.1 Ma obtained from youngest detrital zircons, provide a robust constrain on the initial sedimentation phases of the foreland basin in northwestern Sichuan, and suggest that the terrigenous turnover was not related to the Carnian Pluvial Episode

    Dust deposition in the Aral Sea: implications for changes in atmospheric circulation in central Asia during the past 2000 year

    No full text
    International audienceWe investigated mineral aerosol (dust) deposition in the Aral Sea with intention to understand the variability of dust in central Asia and its implications for atmospheric circulation change in the late Holocene. Using an 11.12-m sediment core of the lake, we calculated bulk sediment fluxes at high time-resolution and analyzed grain-size distributions of detrital sediments. A refined age-depth model was established by combined methods of radiocarbon dating and archeological evidence. Besides, a principal component analysis (PCA) of grain-size fractions and elements (Fe, Ti, K, Ca, Sr) was used to assess the potential processes controlling detrital inputs. The results suggest that two processes are mainly relevant for the clastic input as the medium silt fractions and Ti, Fe and K are positively correlated with Component 1 (Cl), and the fine size fractions (<6 mu m) are positively correlated with Component 2 (C2). Taking the results of the PCA, geological backgrounds, clastic input processes into account, we propose that the medium silt fractions and, in particular, the grain-size fraction ratio (6-32 mu m/2-6 mu m), can serve as indicators of the variability of airborne dust in the Aral Sea region. On the contrary, the fine size fractions appear to be contributed mainly by the sheetwash processes. The bulk sediment deposition fluxes were extremely high during the Little Ice Age (LIA; AD 1400-1780), which may be related to the increased dust deposition. As indicated by the variations of grain-size ratio and Ti, the history of dust deposition in central Asia can be divided into five distinct periods, with a remarkably low deposition during AD 1-350, a moderately high value from AD 350-720, a return to relatively low level between AD 720 and AD 1400 (including the Medieval Warm Period (MWP, AD 755-1070)), an exceptionally high deposition from AD 1400 to 1940s and an abnormally low value since 1940s. The temporal variations in the dust deposition are consistent with the changes in the Siberian High (SH) and mean atmospheric temperature of the northern hemisphere during the past 2000 years, with low/high annual temperature anomalies corresponding to high/low dust supplies in the Aral Sea sediments, respectively. The variations in the fine size fraction also show a broadly similarity to a lacustrine delta(18)O record in Turkey (Jones et al., 2006), implying that there was less moisture entering western central Asia from the Mediterranean during the LIA than during the MW

    Proterozoic to Phanerozoic case studies of laser ablation microanalysis for microbial carbonate U–Pb geochronology

    No full text
    Some of the earliest bio-sedimentary records of life on Earth are represented by microbial carbonates, which are also critical geochemical archives of ancient seawater chemistry and the environmental circumstances in which they precipitated. Reconstructing paleo-microbial environments on Earth and potentially other planets requires precise determination of the depositional ages of these materials. The (abiogenic) carbonate geochemistry communities can now use developments in in-situ laser ablation U-Pb dating using inductively coupled plasma mass spectrometry (LA-ICP-MS). Due to the effects of impurity mixing and diagenesis, microbial carbonates have received little geochronological study despite their broad relevance for understanding ancient seawater's environmental conditions and geochemical compositions. This study demonstrates using time-of-flight mass spectrometry (TOF-MS) to perform quick, quantitative elemental mapping before U-Pb spot dating to improve experiment success rates and data reliability and offers four practical application examples

    A Review of Research Progress on the Analytical Method of Large-<i>n</i> Detrital Zircon U-Pb Geochronology

    No full text
    BACKGROUND: Detrital zircon U-Pb geochronology is an important tool for identifying sedimentary provenance and determining the maximum depositional age. The numbers of grains for detrital zircon provenance investigations using laser-ablation inductively coupled-plasma mass spectrometer (LA-ICP-MS) typically range from 60 to 120. In this range, age components are commonly not identified from the sample aliquot. In order to improve the reliability of provenance investigation, analysis of more grains (n ≥ 300) or even the large-n aliquot with more than 1000 grains (n>1000) are required. The emergence of large-n detrital zircon U-Pb geochronology is challenging the methods of data measurement, reduction and evaluation.OBJECTIVES: To summarize the progress of measurement, data reduction and data evaluation of large-n detrital zircon U-Pb geochronology.METHODS: By summarizing the method innovation of domestic and foreign literature.RESULTS: Firstly, each measurement requires rapid acquisition of U and Pb isotope signals, which can be conducted by improving the transmission efficiency of aerosol. The "flat" signal acquisition time can be shortened or transformed to a "peak" signal mode for rapid measurement. Secondly, large-n data require efficient data reduction protocol or powerful software (e.g. iolite) to improve visualization and reduce the variability between inter-laboratory comparisons. For U-Pb data processing flow, several optimized methods are introduced for fractionation correction and propagating uncertainty. In addition, total integrated counts and linear regression correction are introduced to specially process "peak" signals. Thirdly, the new calculation method of U-Pb and Pb-Pb age discordance, such as using Aitchison concordia distance, makes data filtering more reasonable. Based on recent research progress, the future of automation and standardization of large-n detrital zircon U-Pb geochronology is discussed and advice on the selection of instruments and reduction software is provided.CONCLUSIONS:In the future, the development of large-n detrital zircon U-Pb geochronology has great prospects, and will play a greater role in the study of provenance tracing and stratigraphic dating
    corecore