480 research outputs found

    A discriminative model for understanding natural language route directions

    Get PDF
    To be useful teammates to human partners, robots must be able to follow spoken instructions given in natural language. However, determining the correct sequence of actions in response to a set of spoken instructions is a complex decision-making problem. There is a "semantic gap" between the high-level symbolic models of the world that people use, and the low-level models of geometry, state dynamics, and perceptions that robots use. In this paper, we show how this gap can be bridged by inferring the best sequence of actions from a linguistic description and environmental features. This work improves upon previous work in three ways. First, by using a conditional random field (CRF), we learn the relative weight of environmental and linguistic features, enabling the system to learn the meanings of words and reducing the modeling effort in learning how to follow commands. Second, a number of long-range features are added, which help the system to use additional structure in the problem. Finally, given a natural language command, we infer both the referred path and landmark directly, thereby requiring the algorithm to pick a landmark by which it should navigate. The CRF is demonstrated to have 15% error on a held-out dataset, when compared with 39% error for a Markov random field (MRF). Finally, by analyzing the additional annotations necessary for this work, we find that natural language route directions map sequentially onto the corresponding path and landmarks 99.6% of the time. In addition, the size of the referred landmark varies from 0m[superscript 2] to 1964m[superscript 2] and the length of the referred path varies from 0m to 40.83m.United States. Office of Naval Research (MURIs N00014-07-1-0749

    Learning to understand spatial language for robotic navigation and mobile manipulation

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2011.Cataloged from PDF version of thesis.Includes bibliographical references (p. 103-108).This thesis focuses on understanding task-constrained natural language commands, where a person gives a natural language command to the robot and the robot infers and executes the corresponding plan. Understanding natural language is difficult because a system must infer the location of landmarks such as "the computer cluster," and actions corresponding to spatial relations such as "to" or "around" and verbs such as "put" or "take." each of which may be composed in complex ways. In addition, different people may give very different types of commands to perform the same action. The first chapter of this thesis focuses on simple natural language commands such as "Find the computer." where a person commands the robot to find an object or place and the robot must infer a corresponding plan. This problem would be easy if we constrained the set of words that the robot might need to reason about. However, if a person says, "find the computer," and the robot has not previously detected a "computer," then it is not clear where the robot should look. We present a method that uses previously detected objects and places in order to bias the search process toward areas of the environment where a previously unseen object is likely to be found. The system uses a semantic map of the environment together with a model of contextual relationships between objects to infer this plan, which finds the query object with minimal travel time. The contextual relationships are learned from the captions of a large dataset of photos downloaded from Flickr. Simulated and realworld experiments show that a small subset of detectable objects and scenes are able to predict the location of previously unseen objects and places. In the second chapter, we take steps toward building a robust spatial language understanding system for three different domains: route directions, visual inspection, and indoor mobility. We take as input a natural language command such as "Go through the double doors and down the hallway," extract a semantic structure called a Spatial Description Clause (SDC) from the language, and ground each SDC in a partial or complete semantic map of the environment. By extracting a flat sequence of SDCs, we are able to ground the language by using a probabilistic graphical model that is factored into three key components. First, a landmark component grounds novel noun phrases such as "'the computers" in the perceptual frame of the robot by exploiting object co-occurrence statistics between unknown noun phrases and known perceptual features.(cont.) These statistics are learned from a large database of tagged images such as Flickr, and build off of the model developed in the first component of the thesis. Second, a spatial reasoning component judges how well spatial relations such as "past the computers" describe the path of the robot relative to a landmark. Third, a verb understanding component judges how well spatial verb phrases such as "follow". "meet", "avoid" and "turn right" describe how an agent moves on its own or in relation to another agent. Once trained, our model requires only a metric map of the environment together with the locations of detected objects in order to follow directions through it. This map can be given a priori or created on the fly as the robot explores the environment. In the final chapter of the thesis, we focus on understanding mobile manipulation commands such as, "Put the tire pallet oii the truck." The first contribution of this chapter is the Generalized Grounding Graph (G3 ), which connects language onto grounded aspects of the environment. In this chapter, we relax the assumption that the language has fixed and flat structure and provide a method for constructing a hierarchical probabilistic graphical model that connects each element in a natural language command to an object. place., path or event in the environment. The structure of the G3 model is dynamically instantiated according to the compositional and hierarchical structure of the command, enabling efficient learning and inference. The second contribution of this chapter is to formulate the problem as a discriminative learning problem that maps from language directly onto a robot plan. This probabilistic model is represented as a conditional random field (CRF) that learns the correspondence of robot plans and the language and is able to learn the meanings of complex verbs such as "put" and "take," as well as spatial relations such as "on" and "to."by Thomas Kollar.Ph.D

    Optimizing robot trajectories using reinforcement learning

    Get PDF
    Thesis (S.M.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2007.Includes bibliographical references (leaves 93-96).The mapping problem has received considerable attention in robotics recently. Mature techniques now allow practitioners to reliably and consistently generate 2-D and 3-D maps of objects, office buildings, city blocks and metropolitan areas with a comparatively small number of errors. Nevertheless, the ease of construction and quality of map are strongly dependent on the exploration strategy used to acquire sensor data. Most exploration strategies concentrate on selecting the next best measurement to take, trading off information gathering for regular relocalization. What has not been studied so far is the effect the robot controller has on the map quality. Certain kinds of robot motion (e.g, sharp turns) are hard to estimate correctly, and increase the likelihood of errors in the mapping process. We show how reinforcement learning can be used to generate better motion control. The learned policy will be shown to reduce the overall map uncertainty and squared error, while jointly reducing data-association errors.by Thomas Kollar.S.M

    Grounding Verbs of Motion in Natural Language Commands to Robots

    Get PDF
    To be useful teammates to human partners, robots must be able to follow spoken instructions given in natural language. An important class of instructions involve interacting with people, such as “Follow the person to the kitchen” or “Meet the person at the elevators.” These instructions require that the robot fluidly react to changes in the environment, not simply follow a pre-computed plan. We present an algorithm for understanding natural language commands with three components. First, we create a cost function that scores the language according to how well it matches a candidate plan in the environment, defined as the log-likelihood of the plan given the command. Components of the cost function include novel models for the meanings of motion verbs such as “follow,” “meet,” and “avoid,” as well as spatial relations such as “to” and landmark phrases such as “the kitchen.” Second, an inference method uses this cost function to perform forward search, finding a plan that matches the natural language command. Third, a high-level controller repeatedly calls the inference method at each timestep to compute a new plan in response to changes in the environment such as the movement of the human partner or other people in the scene. When a command consists of more than a single task, the controller switches to the next task when an earlier one is satisfied. We evaluate our approach on a set of example tasks that require the ability to follow both simple and complex natural language commands. Keywords: Cost Function; Spatial Relation; State Sequence; Edit Distance; Statistical Machine TranslationUnited States. Office of Naval Research (Grant MURI N00014-07-1-0749

    Toward understanding natural language directions

    Get PDF
    Speaking using unconstrained natural language is an intuitive and flexible way for humans to interact with robots. Understanding this kind of linguistic input is challenging because diverse words and phrases must be mapped into structures that the robot can understand, and elements in those structures must be grounded in an uncertain environment. We present a system that follows natural language directions by extracting a sequence of spatial description clauses from the linguistic input and then infers the most probable path through the environment given only information about the environmental geometry and detected visible objects. We use a probabilistic graphical model that factors into three key components. The first component grounds landmark phrases such as "the computers" in the perceptual frame of the robot by exploiting co-occurrence statistics from a database of tagged images such as Flickr. Second, a spatial reasoning component judges how well spatial relations such as "past the computers" describe a path. Finally, verb phrases such as "turn right" are modeled according to the amount of change in orientation in the path. Our system follows 60% of the directions in our corpus to within 15 meters of the true destination, significantly outperforming other approaches.United States. Office of Naval Research (MURI N00014-07-1-0749

    Toward a Probabilistic Approach to Acquiring Information from Human Partners Using Language

    Get PDF
    Our goal is to build robots that can robustly interact with humans using natural language. This problem is extremely challenging because human language is filled with ambiguity, and furthermore, the robot's model of the environment might be much more limited than the human partner. When humans encounter ambiguity in dialog with each other, a key strategy to resolve it is to ask clarifying questions about whatthey do not understand. This paper describes an approach for enabling robots to take the same approach: asking the human partner clarifying questions about ambiguous commands in order to infer better actions. The robot fuses information from the command, the question, and the answer by creating a joint probabilistic graphical model in the Generalized Grounding Graph framework. We demonstrate that by performing inference using information from the command, question and answer, the robot is able to infer object groundings and follow commands with higher accuracythan by using the command alone

    Towards Understanding Hierarchical Natural Language Commands for Robotic Navigation and Manipulation

    Get PDF
    We describe a new model for understanding hierarchical natural language commands for robot navigation and manipulation. The model has three components: a semantic structure that captures the hierarchical structure of language; a cost function that maps the command's semantic structure to the robot's sensorimotor capabilities; and an efficient search method for finding the lowest-cost plan. We present a proof-of-concept system that carries out navigation commands in a simulated setting

    Understanding natural language commands for robotic navigation and mobile manipulation

    Get PDF
    This paper describes a new model for understanding natural language commands given to autonomous systems that perform navigation and mobile manipulation in semi-structured environments. Previous approaches have used models with fixed structure to infer the likelihood of a sequence of actions given the environment and the command. In contrast, our framework, called Generalized Grounding Graphs, dynamically instantiates a probabilistic graphical model for a particular natural language command according to the command's hierarchical and compositional semantic structure. Our system performs inference in the model to successfully find and execute plans corresponding to natural language commands such as "Put the tire pallet on the truck." The model is trained using a corpus of commands collected using crowdsourcing. We pair each command with robot actions and use the corpus to learn the parameters of the model. We evaluate the robot's performance by inferring plans from natural language commands, executing each plan in a realistic robot simulator, and asking users to evaluate the system's performance. We demonstrate that our system can successfully follow many natural language commands from the corpus
    • …
    corecore