113 research outputs found

    Robotic Skill Acquisition via Instruction Augmentation with Vision-Language Models

    Full text link
    In recent years, much progress has been made in learning robotic manipulation policies that follow natural language instructions. Such methods typically learn from corpora of robot-language data that was either collected with specific tasks in mind or expensively re-labelled by humans with rich language descriptions in hindsight. Recently, large-scale pretrained vision-language models (VLMs) like CLIP or ViLD have been applied to robotics for learning representations and scene descriptors. Can these pretrained models serve as automatic labelers for robot data, effectively importing Internet-scale knowledge into existing datasets to make them useful even for tasks that are not reflected in their ground truth annotations? To accomplish this, we introduce Data-driven Instruction Augmentation for Language-conditioned control (DIAL): we utilize semi-supervised language labels leveraging the semantic understanding of CLIP to propagate knowledge onto large datasets of unlabelled demonstration data and then train language-conditioned policies on the augmented datasets. This method enables cheaper acquisition of useful language descriptions compared to expensive human labels, allowing for more efficient label coverage of large-scale datasets. We apply DIAL to a challenging real-world robotic manipulation domain where 96.5% of the 80,000 demonstrations do not contain crowd-sourced language annotations. DIAL enables imitation learning policies to acquire new capabilities and generalize to 60 novel instructions unseen in the original dataset

    Inference on counterfactual distributions

    Get PDF
    August 8, 2008. Revised: April 4, 200

    Inner Monologue: Embodied Reasoning through Planning with Language Models

    Full text link
    Recent works have shown how the reasoning capabilities of Large Language Models (LLMs) can be applied to domains beyond natural language processing, such as planning and interaction for robots. These embodied problems require an agent to understand many semantic aspects of the world: the repertoire of skills available, how these skills influence the world, and how changes to the world map back to the language. LLMs planning in embodied environments need to consider not just what skills to do, but also how and when to do them - answers that change over time in response to the agent's own choices. In this work, we investigate to what extent LLMs used in such embodied contexts can reason over sources of feedback provided through natural language, without any additional training. We propose that by leveraging environment feedback, LLMs are able to form an inner monologue that allows them to more richly process and plan in robotic control scenarios. We investigate a variety of sources of feedback, such as success detection, scene description, and human interaction. We find that closed-loop language feedback significantly improves high-level instruction completion on three domains, including simulated and real table top rearrangement tasks and long-horizon mobile manipulation tasks in a kitchen environment in the real world.Comment: Project website: https://innermonologue.github.i

    Benchmarking implementations of functional languages with ‘Pseudoknot', a float-intensive benchmark

    Get PDF
    Over 25 implementations of different functional languages are benchmarked using the same program, a floating-point intensive application taken from molecular biology. The principal aspects studied are compile time and execution time for the various implementations that were benchmarked. An important consideration is how the program can be modified and tuned to obtain maximal performance on each language implementation. With few exceptions, the compilers take a significant amount of time to compile this program, though most compilers were faster than the then current GNU C compiler (GCC version 2.5.8). Compilers that generate C or Lisp are often slower than those that generate native code directly: the cost of compiling the intermediate form is normally a large fraction of the total compilation time. There is no clear distinction between the runtime performance of eager and lazy implementations when appropriate annotations are used: lazy implementations have clearly come of age when it comes to implementing largely strict applications, such as the Pseudoknot program. The speed of C can be approached by some implementations, but to achieve this performance, special measures such as strictness annotations are required by non-strict implementations. The benchmark results have to be interpreted with care. Firstly, a benchmark based on a single program cannot cover a wide spectrum of ‘typical' applications. Secondly, the compilers vary in the kind and level of optimisations offered, so the effort required to obtain an optimal version of the program is similarly varie

    Potato (Solanum tuberosum L.) tuber ageing induces changes in the proteome and antioxidants associated with the sprouting pattern

    Get PDF
    During post-harvest storage, potato tubers age as they undergo an evolution of their physiological state influencing their sprouting pattern. In the present study, physiological and biochemical approaches were combined to provide new insights on potato (Solanum tuberosum L. cv. Désirée) tuber ageing. An increase in the physiological age index (PAI) value from 0.14 to 0.83 occurred during storage at 4 °C over 270 d. Using this reference frame, a proteomic approach was followed based on two-dimensional electrophoresis. In the experimental conditions of this study, a marked proteolysis of patatin occurred after the PAI reached a value of 0.6. In parallel, several glycolytic enzymes were up-regulated and cellular components influencing protein conformation and the response to stress were altered. The equilibrium between the 20S and 26S forms of the proteasome was modified, the 20S form that recycles oxidized proteins being up-regulated. Two proteins belonging to the cytoskeleton were also differentially expressed during ageing. As most of these changes are also observed in an oxidative stress context, an approach focused on antioxidant compounds and enzymes as well as oxidative damage on polyunsaturated fatty acids and proteins was conducted. All the changes observed during ageing seemed to allow the potato tubers to maintain their radical scavenging activity until the end of the storage period as no accumulation of oxidative damage was observed. These data are interpreted considering the impact of reactive oxygen species on the development and the behaviour of other plant systems undergoing ageing or senescence processes

    Spousal Influences on Parents' Non-Market Time Choices

    Full text link
    This paper considers the effect of spouse's characteristics on three aggregated non-paid time uses, active leisure time; child caregiving time; and home production time, using the American Time Use Survey (ATUS). The time diary of each married individual with children under the age of 13 (mothers and fathers) is analyzed, both in terms of the level of non-paid time and the wife's share of the total level of the daily activity for the couple. Three spousal variables: the relative wage of the wife compared to her husband, spouses' weekly hours of employment; and, in the level equations only, the spouses' time in the same activity are considered. Each of these spousal variables needs to be estimated in order to address issues of both endogeneity and missing data. Three alternative strategies to address these problems are explored: predictions within the sample, predictions from outside the sample and propensity matching which marries mothers with time diaries to fathers with time diaries who have propensity scores similar to the women's husband. The results show very little effect of one spouse on the level of other spouse's unpaid time use. This absence of spousal effects is similar to the reduction of spousal effects in employment time described in Blau and Kahn (2005). In terms of the share of wife's time in the activity, we find higher relative wages of the mother compared to her husband leads to a greater share of child care done by the mother on both weekdays and weekends. No consistent effect of relative wages is found on the mother's share of leisure or home production

    Open X-Embodiment:Robotic learning datasets and RT-X models

    Get PDF
    Large, high-capacity models trained on diverse datasets have shown remarkable successes on efficiently tackling downstream applications. In domains from NLP to Computer Vision, this has led to a consolidation of pretrained models, with general pretrained backbones serving as a starting point for many applications. Can such a consolidation happen in robotics? Conventionally, robotic learning methods train a separate model for every application, every robot, and even every environment. Can we instead train "generalist" X-robot policy that can be adapted efficiently to new robots, tasks, and environments? In this paper, we provide datasets in standardized data formats and models to make it possible to explore this possibility in the context of robotic manipulation, alongside experimental results that provide an example of effective X-robot policies. We assemble a dataset from 22 different robots collected through a collaboration between 21 institutions, demonstrating 527 skills (160266 tasks). We show that a high-capacity model trained on this data, which we call RT-X, exhibits positive transfer and improves the capabilities of multiple robots by leveraging experience from other platforms. The project website is robotics-transformer-x.github.io
    corecore