15 research outputs found

    Optimizing robot trajectories using reinforcement learning

    Get PDF
    Thesis (S.M.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2007.Includes bibliographical references (leaves 93-96).The mapping problem has received considerable attention in robotics recently. Mature techniques now allow practitioners to reliably and consistently generate 2-D and 3-D maps of objects, office buildings, city blocks and metropolitan areas with a comparatively small number of errors. Nevertheless, the ease of construction and quality of map are strongly dependent on the exploration strategy used to acquire sensor data. Most exploration strategies concentrate on selecting the next best measurement to take, trading off information gathering for regular relocalization. What has not been studied so far is the effect the robot controller has on the map quality. Certain kinds of robot motion (e.g, sharp turns) are hard to estimate correctly, and increase the likelihood of errors in the mapping process. We show how reinforcement learning can be used to generate better motion control. The learned policy will be shown to reduce the overall map uncertainty and squared error, while jointly reducing data-association errors.by Thomas Kollar.S.M

    Learning to understand spatial language for robotic navigation and mobile manipulation

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2011.Cataloged from PDF version of thesis.Includes bibliographical references (p. 103-108).This thesis focuses on understanding task-constrained natural language commands, where a person gives a natural language command to the robot and the robot infers and executes the corresponding plan. Understanding natural language is difficult because a system must infer the location of landmarks such as "the computer cluster," and actions corresponding to spatial relations such as "to" or "around" and verbs such as "put" or "take." each of which may be composed in complex ways. In addition, different people may give very different types of commands to perform the same action. The first chapter of this thesis focuses on simple natural language commands such as "Find the computer." where a person commands the robot to find an object or place and the robot must infer a corresponding plan. This problem would be easy if we constrained the set of words that the robot might need to reason about. However, if a person says, "find the computer," and the robot has not previously detected a "computer," then it is not clear where the robot should look. We present a method that uses previously detected objects and places in order to bias the search process toward areas of the environment where a previously unseen object is likely to be found. The system uses a semantic map of the environment together with a model of contextual relationships between objects to infer this plan, which finds the query object with minimal travel time. The contextual relationships are learned from the captions of a large dataset of photos downloaded from Flickr. Simulated and realworld experiments show that a small subset of detectable objects and scenes are able to predict the location of previously unseen objects and places. In the second chapter, we take steps toward building a robust spatial language understanding system for three different domains: route directions, visual inspection, and indoor mobility. We take as input a natural language command such as "Go through the double doors and down the hallway," extract a semantic structure called a Spatial Description Clause (SDC) from the language, and ground each SDC in a partial or complete semantic map of the environment. By extracting a flat sequence of SDCs, we are able to ground the language by using a probabilistic graphical model that is factored into three key components. First, a landmark component grounds novel noun phrases such as "'the computers" in the perceptual frame of the robot by exploiting object co-occurrence statistics between unknown noun phrases and known perceptual features.(cont.) These statistics are learned from a large database of tagged images such as Flickr, and build off of the model developed in the first component of the thesis. Second, a spatial reasoning component judges how well spatial relations such as "past the computers" describe the path of the robot relative to a landmark. Third, a verb understanding component judges how well spatial verb phrases such as "follow". "meet", "avoid" and "turn right" describe how an agent moves on its own or in relation to another agent. Once trained, our model requires only a metric map of the environment together with the locations of detected objects in order to follow directions through it. This map can be given a priori or created on the fly as the robot explores the environment. In the final chapter of the thesis, we focus on understanding mobile manipulation commands such as, "Put the tire pallet oii the truck." The first contribution of this chapter is the Generalized Grounding Graph (G3 ), which connects language onto grounded aspects of the environment. In this chapter, we relax the assumption that the language has fixed and flat structure and provide a method for constructing a hierarchical probabilistic graphical model that connects each element in a natural language command to an object. place., path or event in the environment. The structure of the G3 model is dynamically instantiated according to the compositional and hierarchical structure of the command, enabling efficient learning and inference. The second contribution of this chapter is to formulate the problem as a discriminative learning problem that maps from language directly onto a robot plan. This probabilistic model is represented as a conditional random field (CRF) that learns the correspondence of robot plans and the language and is able to learn the meanings of complex verbs such as "put" and "take," as well as spatial relations such as "on" and "to."by Thomas Kollar.Ph.D

    A discriminative model for understanding natural language route directions

    Get PDF
    To be useful teammates to human partners, robots must be able to follow spoken instructions given in natural language. However, determining the correct sequence of actions in response to a set of spoken instructions is a complex decision-making problem. There is a "semantic gap" between the high-level symbolic models of the world that people use, and the low-level models of geometry, state dynamics, and perceptions that robots use. In this paper, we show how this gap can be bridged by inferring the best sequence of actions from a linguistic description and environmental features. This work improves upon previous work in three ways. First, by using a conditional random field (CRF), we learn the relative weight of environmental and linguistic features, enabling the system to learn the meanings of words and reducing the modeling effort in learning how to follow commands. Second, a number of long-range features are added, which help the system to use additional structure in the problem. Finally, given a natural language command, we infer both the referred path and landmark directly, thereby requiring the algorithm to pick a landmark by which it should navigate. The CRF is demonstrated to have 15% error on a held-out dataset, when compared with 39% error for a Markov random field (MRF). Finally, by analyzing the additional annotations necessary for this work, we find that natural language route directions map sequentially onto the corresponding path and landmarks 99.6% of the time. In addition, the size of the referred landmark varies from 0m[superscript 2] to 1964m[superscript 2] and the length of the referred path varies from 0m to 40.83m.United States. Office of Naval Research (MURIs N00014-07-1-0749

    Grounding Verbs of Motion in Natural Language Commands to Robots

    Get PDF
    To be useful teammates to human partners, robots must be able to follow spoken instructions given in natural language. An important class of instructions involve interacting with people, such as “Follow the person to the kitchen” or “Meet the person at the elevators.” These instructions require that the robot fluidly react to changes in the environment, not simply follow a pre-computed plan. We present an algorithm for understanding natural language commands with three components. First, we create a cost function that scores the language according to how well it matches a candidate plan in the environment, defined as the log-likelihood of the plan given the command. Components of the cost function include novel models for the meanings of motion verbs such as “follow,” “meet,” and “avoid,” as well as spatial relations such as “to” and landmark phrases such as “the kitchen.” Second, an inference method uses this cost function to perform forward search, finding a plan that matches the natural language command. Third, a high-level controller repeatedly calls the inference method at each timestep to compute a new plan in response to changes in the environment such as the movement of the human partner or other people in the scene. When a command consists of more than a single task, the controller switches to the next task when an earlier one is satisfied. We evaluate our approach on a set of example tasks that require the ability to follow both simple and complex natural language commands. Keywords: Cost Function; Spatial Relation; State Sequence; Edit Distance; Statistical Machine TranslationUnited States. Office of Naval Research (Grant MURI N00014-07-1-0749

    Toward understanding natural language directions

    Get PDF
    Speaking using unconstrained natural language is an intuitive and flexible way for humans to interact with robots. Understanding this kind of linguistic input is challenging because diverse words and phrases must be mapped into structures that the robot can understand, and elements in those structures must be grounded in an uncertain environment. We present a system that follows natural language directions by extracting a sequence of spatial description clauses from the linguistic input and then infers the most probable path through the environment given only information about the environmental geometry and detected visible objects. We use a probabilistic graphical model that factors into three key components. The first component grounds landmark phrases such as "the computers" in the perceptual frame of the robot by exploiting co-occurrence statistics from a database of tagged images such as Flickr. Second, a spatial reasoning component judges how well spatial relations such as "past the computers" describe a path. Finally, verb phrases such as "turn right" are modeled according to the amount of change in orientation in the path. Our system follows 60% of the directions in our corpus to within 15 meters of the true destination, significantly outperforming other approaches.United States. Office of Naval Research (MURI N00014-07-1-0749

    Understanding natural language commands for robotic navigation and mobile manipulation

    Get PDF
    This paper describes a new model for understanding natural language commands given to autonomous systems that perform navigation and mobile manipulation in semi-structured environments. Previous approaches have used models with fixed structure to infer the likelihood of a sequence of actions given the environment and the command. In contrast, our framework, called Generalized Grounding Graphs, dynamically instantiates a probabilistic graphical model for a particular natural language command according to the command's hierarchical and compositional semantic structure. Our system performs inference in the model to successfully find and execute plans corresponding to natural language commands such as "Put the tire pallet on the truck." The model is trained using a corpus of commands collected using crowdsourcing. We pair each command with robot actions and use the corpus to learn the parameters of the model. We evaluate the robot's performance by inferring plans from natural language commands, executing each plan in a realistic robot simulator, and asking users to evaluate the system's performance. We demonstrate that our system can successfully follow many natural language commands from the corpus

    Approaching the Symbol Grounding Problem with Probabilistic Graphical Models

    Get PDF
    In order for robots to engage in dialog with human teammates, they must have the ability to map between words in the language and aspects of the external world. A solution to this symbol grounding problem (Harnad, 1990) would enable a robot to interpret commands such as “Drive over to receiving and pick up the tire pallet.” In this article we describe several of our results that use probabilistic inference to address the symbol grounding problem. Our specific approach is to develop models that factor according to the linguistic structure of a command. We first describe an early result, a generative model that factors according to the sequential structure of language, and then discuss our new framework, generalized grounding graphs (G3). The G3 framework dynamically instantiates a probabilistic graphical model for a natural language input, enabling a mapping between words in language and concrete objects, places, paths and events in the external world. We report on corpus-based experiments where the robot is able to learn and use word meanings in three real-world tasks: indoor navigation, spatial language video retrieval, and mobile manipulation.U.S. Army Research Laboratory. Collaborative Technology Alliance Program (Cooperative Agreement W911NF-10-2-0016)United States. Office of Naval Research (MURI N00014-07-1-0749

    Elective cancer surgery in COVID-19-free surgical pathways during the SARS-CoV-2 pandemic: An international, multicenter, comparative cohort study

    Get PDF
    PURPOSE As cancer surgery restarts after the first COVID-19 wave, health care providers urgently require data to determine where elective surgery is best performed. This study aimed to determine whether COVID-19–free surgical pathways were associated with lower postoperative pulmonary complication rates compared with hospitals with no defined pathway. PATIENTS AND METHODS This international, multicenter cohort study included patients who underwent elective surgery for 10 solid cancer types without preoperative suspicion of SARS-CoV-2. Participating hospitals included patients from local emergence of SARS-CoV-2 until April 19, 2020. At the time of surgery, hospitals were defined as having a COVID-19–free surgical pathway (complete segregation of the operating theater, critical care, and inpatient ward areas) or no defined pathway (incomplete or no segregation, areas shared with patients with COVID-19). The primary outcome was 30-day postoperative pulmonary complications (pneumonia, acute respiratory distress syndrome, unexpected ventilation). RESULTS Of 9,171 patients from 447 hospitals in 55 countries, 2,481 were operated on in COVID-19–free surgical pathways. Patients who underwent surgery within COVID-19–free surgical pathways were younger with fewer comorbidities than those in hospitals with no defined pathway but with similar proportions of major surgery. After adjustment, pulmonary complication rates were lower with COVID-19–free surgical pathways (2.2% v 4.9%; adjusted odds ratio [aOR], 0.62; 95% CI, 0.44 to 0.86). This was consistent in sensitivity analyses for low-risk patients (American Society of Anesthesiologists grade 1/2), propensity score–matched models, and patients with negative SARS-CoV-2 preoperative tests. The postoperative SARS-CoV-2 infection rate was also lower in COVID-19–free surgical pathways (2.1% v 3.6%; aOR, 0.53; 95% CI, 0.36 to 0.76). CONCLUSION Within available resources, dedicated COVID-19–free surgical pathways should be established to provide safe elective cancer surgery during current and before future SARS-CoV-2 outbreaks

    Elective Cancer Surgery in COVID-19-Free Surgical Pathways During the SARS-CoV-2 Pandemic: An International, Multicenter, Comparative Cohort Study.

    Get PDF
    PURPOSE: As cancer surgery restarts after the first COVID-19 wave, health care providers urgently require data to determine where elective surgery is best performed. This study aimed to determine whether COVID-19-free surgical pathways were associated with lower postoperative pulmonary complication rates compared with hospitals with no defined pathway. PATIENTS AND METHODS: This international, multicenter cohort study included patients who underwent elective surgery for 10 solid cancer types without preoperative suspicion of SARS-CoV-2. Participating hospitals included patients from local emergence of SARS-CoV-2 until April 19, 2020. At the time of surgery, hospitals were defined as having a COVID-19-free surgical pathway (complete segregation of the operating theater, critical care, and inpatient ward areas) or no defined pathway (incomplete or no segregation, areas shared with patients with COVID-19). The primary outcome was 30-day postoperative pulmonary complications (pneumonia, acute respiratory distress syndrome, unexpected ventilation). RESULTS: Of 9,171 patients from 447 hospitals in 55 countries, 2,481 were operated on in COVID-19-free surgical pathways. Patients who underwent surgery within COVID-19-free surgical pathways were younger with fewer comorbidities than those in hospitals with no defined pathway but with similar proportions of major surgery. After adjustment, pulmonary complication rates were lower with COVID-19-free surgical pathways (2.2% v 4.9%; adjusted odds ratio [aOR], 0.62; 95% CI, 0.44 to 0.86). This was consistent in sensitivity analyses for low-risk patients (American Society of Anesthesiologists grade 1/2), propensity score-matched models, and patients with negative SARS-CoV-2 preoperative tests. The postoperative SARS-CoV-2 infection rate was also lower in COVID-19-free surgical pathways (2.1% v 3.6%; aOR, 0.53; 95% CI, 0.36 to 0.76). CONCLUSION: Within available resources, dedicated COVID-19-free surgical pathways should be established to provide safe elective cancer surgery during current and before future SARS-CoV-2 outbreaks

    Utilizing object-object and object-scene context when planning to find things

    No full text
    In this paper, our goal is to search for a novel object, where we have a prior map of the environment and knowledge of some of the objects in it, but no information about the location of the specific novel object. We develop a probabilistic model over possible object locations that utilizes object-object and object-scene context. This model can be queried for any of over 25,000 naturally occurring objects in the world and is trained from labeled data acquired from the captions of photos on the Flickr Website. We show that these simple models based on object co-occurrences perform surprisingly well at localizing arbitrary objects in an office setting. In addition, we show how to compute paths that minimize the expected distance to the query object and show that this approach performs better than a greedy approach. Finally, we give preliminary results for grounding our approach in object classifiers.United States. Office of Naval Research (MURI N00014-07-1-0749
    corecore