Search CORE

2 research outputs found

Learning to understand spatial language for robotic navigation and mobile manipulation

Author: Kollar Thomas (Thomas Fleming)
Publication venue: Massachusetts Institute of Technology
Publication date: 01/01/2011
Field of study

Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2011.Cataloged from PDF version of thesis.Includes bibliographical references (p. 103-108).This thesis focuses on understanding task-constrained natural language commands, where a person gives a natural language command to the robot and the robot infers and executes the corresponding plan. Understanding natural language is difficult because a system must infer the location of landmarks such as "the computer cluster," and actions corresponding to spatial relations such as "to" or "around" and verbs such as "put" or "take." each of which may be composed in complex ways. In addition, different people may give very different types of commands to perform the same action. The first chapter of this thesis focuses on simple natural language commands such as "Find the computer." where a person commands the robot to find an object or place and the robot must infer a corresponding plan. This problem would be easy if we constrained the set of words that the robot might need to reason about. However, if a person says, "find the computer," and the robot has not previously detected a "computer," then it is not clear where the robot should look. We present a method that uses previously detected objects and places in order to bias the search process toward areas of the environment where a previously unseen object is likely to be found. The system uses a semantic map of the environment together with a model of contextual relationships between objects to infer this plan, which finds the query object with minimal travel time. The contextual relationships are learned from the captions of a large dataset of photos downloaded from Flickr. Simulated and realworld experiments show that a small subset of detectable objects and scenes are able to predict the location of previously unseen objects and places. In the second chapter, we take steps toward building a robust spatial language understanding system for three different domains: route directions, visual inspection, and indoor mobility. We take as input a natural language command such as "Go through the double doors and down the hallway," extract a semantic structure called a Spatial Description Clause (SDC) from the language, and ground each SDC in a partial or complete semantic map of the environment. By extracting a flat sequence of SDCs, we are able to ground the language by using a probabilistic graphical model that is factored into three key components. First, a landmark component grounds novel noun phrases such as "'the computers" in the perceptual frame of the robot by exploiting object co-occurrence statistics between unknown noun phrases and known perceptual features.(cont.) These statistics are learned from a large database of tagged images such as Flickr, and build off of the model developed in the first component of the thesis. Second, a spatial reasoning component judges how well spatial relations such as "past the computers" describe the path of the robot relative to a landmark. Third, a verb understanding component judges how well spatial verb phrases such as "follow". "meet", "avoid" and "turn right" describe how an agent moves on its own or in relation to another agent. Once trained, our model requires only a metric map of the environment together with the locations of detected objects in order to follow directions through it. This map can be given a priori or created on the fly as the robot explores the environment. In the final chapter of the thesis, we focus on understanding mobile manipulation commands such as, "Put the tire pallet oii the truck." The first contribution of this chapter is the Generalized Grounding Graph (G3 ), which connects language onto grounded aspects of the environment. In this chapter, we relax the assumption that the language has fixed and flat structure and provide a method for constructing a hierarchical probabilistic graphical model that connects each element in a natural language command to an object. place., path or event in the environment. The structure of the G3 model is dynamically instantiated according to the compositional and hierarchical structure of the command, enabling efficient learning and inference. The second contribution of this chapter is to formulate the problem as a discriminative learning problem that maps from language directly onto a robot plan. This probabilistic model is represented as a conditional random field (CRF) that learns the correspondence of robot plans and the language and is able to learn the meanings of complex verbs such as "put" and "take," as well as spatial relations such as "on" and "to."by Thomas Kollar.Ph.D

DSpace@MIT

Robot Navigation in Human Environments

Author: Oßwald Stefan
Publication venue: Universitäts- und Landesbibliothek Bonn
Publication date
Field of study

For the near future, we envision service robots that will help us with everyday chores in home, office, and urban environments. These robots need to work in environments that were designed for humans and they have to collaborate with humans to fulfill their tasks. In this thesis, we propose new methods for communicating, transferring knowledge, and collaborating between humans and robots in four different navigation tasks. In the first application, we investigate how automated services for giving wayfinding directions can be improved to better address the needs of the human recipients. We propose a novel method based on inverse reinforcement learning that learns from a corpus of human-written route descriptions what amount and type of information a route description should contain. By imitating the human teachers' description style, our algorithm produces new route descriptions that sound similarly natural and convey similar information content, as we show in a user study. In the second application, we investigate how robots can leverage background information provided by humans for exploring an unknown environment more efficiently. We propose an algorithm for exploiting user-provided information such as sketches or floor plans by combining a global exploration strategy based on the solution of a traveling salesman problem with a local nearest-frontier-first exploration scheme. Our experiments show that the exploration tours are significantly shorter and that our system allows the user to effectively select the areas that the robot should explore. In the second part of this thesis, we focus on humanoid robots in home and office environments. The human-like body plan allows humanoid robots to navigate in environments and operate tools that were designed for humans, making humanoid robots suitable for a wide range of applications. As localization and mapping are prerequisites for all navigation tasks, we first introduce a novel feature descriptor for RGB-D sensor data and integrate this building block into an appearance-based simultaneous localization and mapping system that we adapt and optimize for the usage on humanoid robots. Our optimized system is able to track a real Nao humanoid robot more accurately and more robustly than existing approaches. As the third application, we investigate how humanoid robots can cover known environments efficiently with their camera, for example for inspection or search tasks. We extend an existing next-best-view approach by integrating inverse reachability maps, allowing us to efficiently sample and check collision-free full-body poses. Our approach enables the robot to inspect as much of the environment as possible. In our fourth application, we extend the coverage scenario to environments that also include articulated objects that the robot has to actively manipulate to uncover obstructed regions. We introduce algorithms for navigation subtasks that run highly parallelized on graphics processing units for embedded devices. Together with a novel heuristic for estimating utility maps, our system allows to find high-utility camera poses for efficiently covering environments with articulated objects. All techniques presented in this thesis were implemented in software and thoroughly evaluated in user studies, simulations, and experiments in both artificial and real-world environments. Our approaches advance the state of the art towards universally usable robots in everyday environments.Roboternavigation in menschlichen Umgebungen In naher Zukunft erwarten wir Serviceroboter, die uns im Haushalt, im Büro und in der Stadt alltägliche Arbeiten abnehmen. Diese Roboter müssen in für Menschen gebauten Umgebungen zurechtkommen und sie müssen mit Menschen zusammenarbeiten um ihre Aufgaben zu erledigen. In dieser Arbeit schlagen wir neue Methoden für die Kommunikation, Wissenstransfer und Zusammenarbeit zwischen Menschen und Robotern bei Navigationsaufgaben in vier Anwendungen vor. In der ersten Anwendung untersuchen wir, wie automatisierte Dienste zur Generierung von Wegbeschreibungen verbessert werden können, um die Beschreibungen besser an die Bedürfnisse der Empfänger anzupassen. Wir schlagen eine neue Methode vor, die inverses bestärkendes Lernen nutzt, um aus einem Korpus von von Menschen geschriebenen Wegbeschreibungen zu lernen, wie viel und welche Art von Information eine Wegbeschreibung enthalten sollte. Indem unser Algorithmus den Stil der Wegbeschreibungen der menschlichen Lehrer imitiert, kann der Algorithmus neue Wegbeschreibungen erzeugen, die sich ähnlich natürlich anhören und einen ähnlichen Informationsgehalt vermitteln, was wir in einer Benutzerstudie zeigen. In der zweiten Anwendung untersuchen wir, wie Roboter von Menschen bereitgestellte Hintergrundinformationen nutzen können, um eine bisher unbekannte Umgebung schneller zu erkunden. Wir schlagen einen Algorithmus vor, der Hintergrundinformationen wie Gebäudegrundrisse oder Skizzen nutzt, indem er eine globale Explorationsstrategie basierend auf der Lösung eines Problems des Handlungsreisenden kombiniert mit einer lokalen Explorationsstrategie. Unsere Experimente zeigen, dass die Erkundungstouren signifikant kürzer werden und dass der Benutzer mit unserem System effektiv die zu erkundenden Regionen spezifizieren kann. Der zweite Teil dieser Arbeit konzentriert sich auf humanoide Roboter in Umgebungen zu Hause und im Büro. Der menschenähnliche Körperbau ermöglicht es humanoiden Robotern, in Umgebungen zu navigieren und Werkzeuge zu benutzen, die für Menschen gebaut wurden, wodurch humanoide Roboter für vielfältige Aufgaben einsetzbar sind. Da Lokalisierung und Kartierung Grundvoraussetzungen für alle Navigationsaufgaben sind, führen wir zunächst einen neuen Merkmalsdeskriptor für RGB-D-Sensordaten ein und integrieren diesen Baustein in ein erscheinungsbasiertes simultanes Lokalisierungs- und Kartierungsverfahren, das wir an die Besonderheiten von humanoiden Robotern anpassen und optimieren. Unser System kann die Position eines realen humanoiden Roboters genauer und robuster verfolgen, als es mit existierenden Ansätzen möglich ist. Als dritte Anwendung untersuchen wir, wie humanoide Roboter bekannte Umgebungen effizient mit ihrer Kamera abdecken können, beispielsweise zu Inspektionszwecken oder zum Suchen eines Gegenstands. Wir erweitern ein bestehendes Verfahren, das die nächstbeste Beobachtungsposition berechnet, durch inverse Erreichbarkeitskarten, wodurch wir kollisionsfreie Ganzkörperposen effizient generieren und prüfen können. Unser Ansatz ermöglicht es dem Roboter, so viel wie möglich von der Umgebung zu untersuchen. In unserer vierten Anwendung erweitern wir dieses Szenario um Umgebungen, die auch bewegbare Gegenstände enthalten, die der Roboter aktiv bewegen muss um verdeckte Regionen zu sehen. Wir führen Algorithmen für Teilprobleme ein, die hoch parallelisiert auf Grafikkarten von eingebetteten Systemen ausgeführt werden. Zusammen mit einer neuen Heuristik zur Schätzung von Nutzenkarten ermöglicht dies unserem System Beobachtungspunkte mit hohem Nutzen zu finden, um Umgebungen mit bewegbaren Objekten effizient zu inspizieren. Alle vorgestellten Techniken wurden in Software implementiert und sorgfältig evaluiert in Benutzerstudien, Simulationen und Experimenten in künstlichen und realen Umgebungen. Unsere Verfahren bringen den Stand der Forschung voran in Richtung universell einsetzbarer Roboter in alltäglichen Umgebungen

bonndoc – Der Publikationsserver der Universität Bonn