10,602 research outputs found

    VirtualHome: Simulating Household Activities via Programs

    Full text link
    In this paper, we are interested in modeling complex activities that occur in a typical household. We propose to use programs, i.e., sequences of atomic actions and interactions, as a high level representation of complex tasks. Programs are interesting because they provide a non-ambiguous representation of a task, and allow agents to execute them. However, nowadays, there is no database providing this type of information. Towards this goal, we first crowd-source programs for a variety of activities that happen in people's homes, via a game-like interface used for teaching kids how to code. Using the collected dataset, we show how we can learn to extract programs directly from natural language descriptions or from videos. We then implement the most common atomic (inter)actions in the Unity3D game engine, and use our programs to "drive" an artificial agent to execute tasks in a simulated household environment. Our VirtualHome simulator allows us to create a large activity video dataset with rich ground-truth, enabling training and testing of video understanding models. We further showcase examples of our agent performing tasks in our VirtualHome based on language descriptions.Comment: CVPR 2018 (Oral

    Diseño de una arquitectura robótica para mapear un lenguaje de acción a comandos de movimiento de bajo nivel para manipulación hábil

    Get PDF
    This paper gives an overview of a robotic architecture meant for skillful manipulation. This design is meant to close the gap between the high level layer (reasoning and planing layer) and the object model system (physical control layer). This architecture proposes an interface layer that allows, in a meaningful way, to connect atomic tasks with controller inputs. In this paper, we discuss how specific complex tasks can be resolved by this system; we analyze the affordance unit design and, we overview the future challenges in the implemenation of the whole system.Este artículo ofrece una visión general de una arquitectura robótica destinada a la manipulación hábil. Este diseño está destinado a cerrar la brecha entre la capa de alto nivel (capa de razonamiento y planificación) y el sistema de modelo de objetos (capa de control físico). Esta arquitectura propone una capa de interfaz que permite, de manera significativa, conectar tareas básicas con el controlador. En este artículo, discutimos cómo este sistema puede resolver tareas complejas específicas; analizamos el diseño de la unidad de accesibilidad y presentamos una visión general de los desafíos futuros en la implementación de todo el sistema.Universidad de Costa Rica/[322-B6-279]/UCR/Costa RicaUCR::Vicerrectoría de Investigación::Unidades de Investigación::Ingeniería::Instituto Investigaciones en Ingeniería (INII)UCR::Vicerrectoría de Docencia::Ingeniería::Facultad de Ingeniería::Escuela de Ingeniería EléctricaUCR::Vicerrectoría de Investigación::Sistema de Estudios de Posgrado::Ingeniería::Maestría Académica en Ingeniería Eléctric

    A review and comparison of ontology-based approaches to robot autonomy

    Get PDF
    Within the next decades, robots will need to be able to execute a large variety of tasks autonomously in a large variety of environments. To relax the resulting programming effort, a knowledge-enabled approach to robot programming can be adopted to organize information in re-usable knowledge pieces. However, for the ease of reuse, there needs to be an agreement on the meaning of terms. A common approach is to represent these terms using ontology languages that conceptualize the respective domain. In this work, we will review projects that use ontologies to support robot autonomy. We will systematically search for projects that fulfill a set of inclusion criteria and compare them with each other with respect to the scope of their ontology, what types of cognitive capabilities are supported by the use of ontologies, and which is their application domain.Peer ReviewedPostprint (author's final draft

    Interpretation of Natural-language Robot Instructions: Probabilistic Knowledge Representation, Learning, and Reasoning

    Get PDF
    A robot that can be simply told in natural language what to do -- this has been one of the ultimate long-standing goals in both Artificial Intelligence and Robotics research. In near-future applications, robotic assistants and companions will have to understand and perform commands such as set the table for dinner'', make pancakes for breakfast'', or cut the pizza into 8 pieces.'' Although such instructions are only vaguely formulated, complex sequences of sophisticated and accurate manipulation activities need to be carried out in order to accomplish the respective tasks. The acquisition of knowledge about how to perform these activities from huge collections of natural-language instructions from the Internet has garnered a lot of attention within the last decade. However, natural language is typically massively unspecific, incomplete, ambiguous and vague and thus requires powerful means for interpretation. This work presents PRAC -- Probabilistic Action Cores -- an interpreter for natural-language instructions which is able to resolve vagueness and ambiguity in natural language and infer missing information pieces that are required to render an instruction executable by a robot. To this end, PRAC formulates the problem of instruction interpretation as a reasoning problem in first-order probabilistic knowledge bases. In particular, the system uses Markov logic networks as a carrier formalism for encoding uncertain knowledge. A novel framework for reasoning about unmodeled symbolic concepts is introduced, which incorporates ontological knowledge from taxonomies and exploits semantically similar relational structures in a domain of discourse. The resulting reasoning framework thus enables more compact representations of knowledge and exhibits strong generalization performance when being learnt from very sparse data. Furthermore, a novel approach for completing directives is presented, which applies semantic analogical reasoning to transfer knowledge collected from thousands of natural-language instruction sheets to new situations. In addition, a cohesive processing pipeline is described that transforms vague and incomplete task formulations into sequences of formally specified robot plans. The system is connected to a plan executive that is able to execute the computed plans in a simulator. Experiments conducted in a publicly accessible, browser-based web interface showcase that PRAC is capable of closing the loop from natural-language instructions to their execution by a robot

    Envisioning the qualitative effects of robot manipulation actions using simulation-based projections

    Get PDF
    Autonomous robots that are to perform complex everyday tasks such as making pancakes have to understand how the effects of an action depend on the way the action is executed. Within Artificial Intelligence, classical planning reasons about whether actions are executable, but makes the assumption that the actions will succeed (with some probability). In this work, we have designed, implemented, and analyzed a framework that allows us to envision the physical effects of robot manipulation actions. We consider envisioning to be a qualitative reasoning method that reasons about actions and their effects based on simulation-based projections. Thereby it allows a robot to infer what could happen when it performs a task in a certain way. This is achieved by translating a qualitative physics problem into a parameterized simulation problem; performing a detailed physics-based simulation of a robot plan; logging the state evolution into appropriate data structures; and then translating these sub-symbolic data structures into interval-based first-order symbolic, qualitative representations, called timelines. The result of the envisioning is a set of detailed narratives represented by timelines which are then used to infer answers to qualitative reasoning problems. By envisioning the outcome of actions before committing to them, a robot is able to reason about physical phenomena and can therefore prevent itself from ending up in unwanted situations. Using this approach, robots can perform manipulation tasks more efficiently, robustly, and flexibly, and they can even successfully accomplish previously unknown variations of tasks

    Natural language programming of industrial robots

    Get PDF
    In this paper, we introduce a method to use written natural language instructions to program assembly tasks for industrial robots. In our application, we used a state-of-the-art semantic and syntactic parser together with semantically rich world and skill descriptions to create highlevel symbolic task sequences. From these sequences, we generated executable code for both virtual and physical robot systems. Our focus lays on the applicability of these methods in an industrial setting with real-time constraints

    Language Models as Zero-Shot Trajectory Generators

    Full text link
    Large Language Models (LLMs) have recently shown promise as high-level planners for robots when given access to a selection of low-level skills. However, it is often assumed that LLMs do not possess sufficient knowledge to be used for the low-level trajectories themselves. In this work, we address this assumption thoroughly, and investigate if an LLM (GPT-4) can directly predict a dense sequence of end-effector poses for manipulation skills, when given access to only object detection and segmentation vision models. We study how well a single task-agnostic prompt, without any in-context examples, motion primitives, or external trajectory optimisers, can perform across 26 real-world language-based tasks, such as "open the bottle cap" and "wipe the plate with the sponge", and we investigate which design choices in this prompt are the most effective. Our conclusions raise the assumed limit of LLMs for robotics, and we reveal for the first time that LLMs do indeed possess an understanding of low-level robot control sufficient for a range of common tasks, and that they can additionally detect failures and then re-plan trajectories accordingly. Videos, code, and prompts are available at: https://www.robot-learning.uk/language-models-trajectory-generators.Comment: 19 pages, 21 figure
    corecore