999 research outputs found

    Learning to understand spatial language for robotic navigation and mobile manipulation

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2011.Cataloged from PDF version of thesis.Includes bibliographical references (p. 103-108).This thesis focuses on understanding task-constrained natural language commands, where a person gives a natural language command to the robot and the robot infers and executes the corresponding plan. Understanding natural language is difficult because a system must infer the location of landmarks such as "the computer cluster," and actions corresponding to spatial relations such as "to" or "around" and verbs such as "put" or "take." each of which may be composed in complex ways. In addition, different people may give very different types of commands to perform the same action. The first chapter of this thesis focuses on simple natural language commands such as "Find the computer." where a person commands the robot to find an object or place and the robot must infer a corresponding plan. This problem would be easy if we constrained the set of words that the robot might need to reason about. However, if a person says, "find the computer," and the robot has not previously detected a "computer," then it is not clear where the robot should look. We present a method that uses previously detected objects and places in order to bias the search process toward areas of the environment where a previously unseen object is likely to be found. The system uses a semantic map of the environment together with a model of contextual relationships between objects to infer this plan, which finds the query object with minimal travel time. The contextual relationships are learned from the captions of a large dataset of photos downloaded from Flickr. Simulated and realworld experiments show that a small subset of detectable objects and scenes are able to predict the location of previously unseen objects and places. In the second chapter, we take steps toward building a robust spatial language understanding system for three different domains: route directions, visual inspection, and indoor mobility. We take as input a natural language command such as "Go through the double doors and down the hallway," extract a semantic structure called a Spatial Description Clause (SDC) from the language, and ground each SDC in a partial or complete semantic map of the environment. By extracting a flat sequence of SDCs, we are able to ground the language by using a probabilistic graphical model that is factored into three key components. First, a landmark component grounds novel noun phrases such as "'the computers" in the perceptual frame of the robot by exploiting object co-occurrence statistics between unknown noun phrases and known perceptual features.(cont.) These statistics are learned from a large database of tagged images such as Flickr, and build off of the model developed in the first component of the thesis. Second, a spatial reasoning component judges how well spatial relations such as "past the computers" describe the path of the robot relative to a landmark. Third, a verb understanding component judges how well spatial verb phrases such as "follow". "meet", "avoid" and "turn right" describe how an agent moves on its own or in relation to another agent. Once trained, our model requires only a metric map of the environment together with the locations of detected objects in order to follow directions through it. This map can be given a priori or created on the fly as the robot explores the environment. In the final chapter of the thesis, we focus on understanding mobile manipulation commands such as, "Put the tire pallet oii the truck." The first contribution of this chapter is the Generalized Grounding Graph (G3 ), which connects language onto grounded aspects of the environment. In this chapter, we relax the assumption that the language has fixed and flat structure and provide a method for constructing a hierarchical probabilistic graphical model that connects each element in a natural language command to an object. place., path or event in the environment. The structure of the G3 model is dynamically instantiated according to the compositional and hierarchical structure of the command, enabling efficient learning and inference. The second contribution of this chapter is to formulate the problem as a discriminative learning problem that maps from language directly onto a robot plan. This probabilistic model is represented as a conditional random field (CRF) that learns the correspondence of robot plans and the language and is able to learn the meanings of complex verbs such as "put" and "take," as well as spatial relations such as "on" and "to."by Thomas Kollar.Ph.D

    Robotics 2010

    Get PDF
    Without a doubt, robotics has made an incredible progress over the last decades. The vision of developing, designing and creating technical systems that help humans to achieve hard and complex tasks, has intelligently led to an incredible variety of solutions. There are barely technical fields that could exhibit more interdisciplinary interconnections like robotics. This fact is generated by highly complex challenges imposed by robotic systems, especially the requirement on intelligent and autonomous operation. This book tries to give an insight into the evolutionary process that takes place in robotics. It provides articles covering a wide range of this exciting area. The progress of technical challenges and concepts may illuminate the relationship between developments that seem to be completely different at first sight. The robotics remains an exciting scientific and engineering field. The community looks optimistically ahead and also looks forward for the future challenges and new development

    Pragmatic Frames for Teaching and Learning in Human-Robot interaction: Review and Challenges

    Get PDF
    Vollmer A-L, Wrede B, Rohlfing KJ, Oudeyer P-Y. Pragmatic Frames for Teaching and Learning in Human-Robot interaction: Review and Challenges. FRONTIERS IN NEUROROBOTICS. 2016;10: 10.One of the big challenges in robotics today is to learn from human users that are inexperienced in interacting with robots but yet are often used to teach skills flexibly to other humans and to children in particular. A potential route toward natural and efficient learning and teaching in Human-Robot Interaction (HRI) is to leverage the social competences of humans and the underlying interactional mechanisms. In this perspective, this article discusses the importance of pragmatic frames as flexible interaction protocols that provide important contextual cues to enable learners to infer new action or language skills and teachers to convey these cues. After defining and discussing the concept of pragmatic frames, grounded in decades of research in developmental psychology, we study a selection of HRI work in the literature which has focused on learning-teaching interaction and analyze the interactional and learning mechanisms that were used in the light of pragmatic frames. This allows us to show that many of the works have already used in practice, but not always explicitly, basic elements of the pragmatic frames machinery. However, we also show that pragmatic frames have so far been used in a very restricted way as compared to how they are used in human-human interaction and argue that this has been an obstacle preventing robust natural multi-task learning and teaching in HRI. In particular, we explain that two central features of human pragmatic frames, mostly absent of existing HRI studies, are that (1) social peers use rich repertoires of frames, potentially combined together, to convey and infer multiple kinds of cues; (2) new frames can be learnt continually, building on existing ones, and guiding the interaction toward higher levels of complexity and expressivity. To conclude, we give an outlook on the future research direction describing the relevant key challenges that need to be solved for leveraging pragmatic frames for robot learning and teaching

    MOG 2007:Workshop on Multimodal Output Generation: CTIT Proceedings

    Get PDF
    This volume brings together presents a wide variety of work offering different perspectives on multimodal generation. Two different strands of work can be distinguished: half of the gathered papers present current work on embodied conversational agents (ECA’s), while the other half presents current work on multimedia applications. Two general research questions are shared by all: what output modalities are most suitable in which situation, and how should different output modalities be combined

    Contemporary Robotics

    Get PDF
    This book book is a collection of 18 chapters written by internationally recognized experts and well-known professionals of the field. Chapters contribute to diverse facets of contemporary robotics and autonomous systems. The volume is organized in four thematic parts according to the main subjects, regarding the recent advances in the contemporary robotics. The first thematic topics of the book are devoted to the theoretical issues. This includes development of algorithms for automatic trajectory generation using redudancy resolution scheme, intelligent algorithms for robotic grasping, modelling approach for reactive mode handling of flexible manufacturing and design of an advanced controller for robot manipulators. The second part of the book deals with different aspects of robot calibration and sensing. This includes a geometric and treshold calibration of a multiple robotic line-vision system, robot-based inline 2D/3D quality monitoring using picture-giving and laser triangulation, and a study on prospective polymer composite materials for flexible tactile sensors. The third part addresses issues of mobile robots and multi-agent systems, including SLAM of mobile robots based on fusion of odometry and visual data, configuration of a localization system by a team of mobile robots, development of generic real-time motion controller for differential mobile robots, control of fuel cells of mobile robots, modelling of omni-directional wheeled-based robots, building of hunter- hybrid tracking environment, as well as design of a cooperative control in distributed population-based multi-agent approach. The fourth part presents recent approaches and results in humanoid and bioinspirative robotics. It deals with design of adaptive control of anthropomorphic biped gait, building of dynamic-based simulation for humanoid robot walking, building controller for perceptual motor control dynamics of humans and biomimetic approach to control mechatronic structure using smart materials

    VoxPoser: Composable 3D Value Maps for Robotic Manipulation with Language Models

    Full text link
    Large language models (LLMs) are shown to possess a wealth of actionable knowledge that can be extracted for robot manipulation in the form of reasoning and planning. Despite the progress, most still rely on pre-defined motion primitives to carry out the physical interactions with the environment, which remains a major bottleneck. In this work, we aim to synthesize robot trajectories, i.e., a dense sequence of 6-DoF end-effector waypoints, for a large variety of manipulation tasks given an open-set of instructions and an open-set of objects. We achieve this by first observing that LLMs excel at inferring affordances and constraints given a free-form language instruction. More importantly, by leveraging their code-writing capabilities, they can interact with a vision-language model (VLM) to compose 3D value maps to ground the knowledge into the observation space of the agent. The composed value maps are then used in a model-based planning framework to zero-shot synthesize closed-loop robot trajectories with robustness to dynamic perturbations. We further demonstrate how the proposed framework can benefit from online experiences by efficiently learning a dynamics model for scenes that involve contact-rich interactions. We present a large-scale study of the proposed method in both simulated and real-robot environments, showcasing the ability to perform a large variety of everyday manipulation tasks specified in free-form natural language. Videos and code at https://voxposer.github.i

    Proceedings of the NASA Conference on Space Telerobotics, volume 2

    Get PDF
    These proceedings contain papers presented at the NASA Conference on Space Telerobotics held in Pasadena, January 31 to February 2, 1989. The theme of the Conference was man-machine collaboration in space. The Conference provided a forum for researchers and engineers to exchange ideas on the research and development required for application of telerobotics technology to the space systems planned for the 1990s and beyond. The Conference: (1) provided a view of current NASA telerobotic research and development; (2) stimulated technical exchange on man-machine systems, manipulator control, machine sensing, machine intelligence, concurrent computation, and system architectures; and (3) identified important unsolved problems of current interest which can be dealt with by future research

    Articulated Object Tracking from Visual Sensory Data for Robotic Manipulation

    Get PDF
    Roboti juhtimine liigestatud objekti manipuleerimisel vajab robustset ja täpsetobjekti oleku hindamist. Oleku hindamise tulemust kasutatakse tagasisidena vastavate roboti liigutuste arvutamisel soovitud manipulatsiooni tulemuse saavutamiseks. Selles töös uuritakse robootilise manipuleerimise visuaalse tagasiside teostamist. Tehisnägemisele põhinevat servode liigutamist juhitakse ruutplaneerimise raamistikus võimaldamaks humanoidsel robotil läbi viia objekti manipulatsiooni. Esitletakse tehisnägemisel põhinevat liigestatud objekti oleku hindamise meetodit. Me näitame väljapakutud meetodi efektiivsust mitmel erineval eksperimendil HRP-4 humanoidse robotiga. Teeme ka ettepaneku ühendada masinõppe ja serva tuvastamise tehnikad liigestatud objekti manipuleerimise markeerimata visuaalse tagasiside teostamiseks reaalajas.In order for a robot to manipulate an articulated object, it needs to know itsstate (i.e. its pose); that is to say: where and in which configuration it is. Theresult of the object’s state estimation is to be provided as a feedback to the control to compute appropriate robot motion and achieve the desired manipulation outcome. This is the main topic of this thesis, where articulated object state estimation is solved using visual feedback. Vision based servoing is implemented in a Quadratic Programming task space control framework to enable humanoid robot to perform articulated objects manipulation. We thoroughly developed our methodology for vision based articulated object state estimation on these bases.We demonstrate its efficiency by assessing it on several real experiments involving the HRP-4 humanoid robot. We also propose to combine machine learning and edge extraction techniques to achieve markerless, realtime and robust visual feedback for articulated object manipulation

    Aerospace medicine and biology: A continuing bibliography with indexes (supplement 324)

    Get PDF
    This bibliography lists 200 reports, articles and other documents introduced into the NASA Scientific and Technical Information System during May, 1989. Subject coverage includes: aerospace medicine and psychology, life support systems and controlled environments, safety equipment, exobiology and extraterrestrial life, and flight crew behavior and performance
    corecore