473 research outputs found

    Kimera: from SLAM to Spatial Perception with 3D Dynamic Scene Graphs

    Full text link
    Humans are able to form a complex mental model of the environment they move in. This mental model captures geometric and semantic aspects of the scene, describes the environment at multiple levels of abstractions (e.g., objects, rooms, buildings), includes static and dynamic entities and their relations (e.g., a person is in a room at a given time). In contrast, current robots' internal representations still provide a partial and fragmented understanding of the environment, either in the form of a sparse or dense set of geometric primitives (e.g., points, lines, planes, voxels) or as a collection of objects. This paper attempts to reduce the gap between robot and human perception by introducing a novel representation, a 3D Dynamic Scene Graph(DSG), that seamlessly captures metric and semantic aspects of a dynamic environment. A DSG is a layered graph where nodes represent spatial concepts at different levels of abstraction, and edges represent spatio-temporal relations among nodes. Our second contribution is Kimera, the first fully automatic method to build a DSG from visual-inertial data. Kimera includes state-of-the-art techniques for visual-inertial SLAM, metric-semantic 3D reconstruction, object localization, human pose and shape estimation, and scene parsing. Our third contribution is a comprehensive evaluation of Kimera in real-life datasets and photo-realistic simulations, including a newly released dataset, uHumans2, which simulates a collection of crowded indoor and outdoor scenes. Our evaluation shows that Kimera achieves state-of-the-art performance in visual-inertial SLAM, estimates an accurate 3D metric-semantic mesh model in real-time, and builds a DSG of a complex indoor environment with tens of objects and humans in minutes. Our final contribution shows how to use a DSG for real-time hierarchical semantic path-planning. The core modules in Kimera are open-source.Comment: 34 pages, 25 figures, 9 tables. arXiv admin note: text overlap with arXiv:2002.0628

    3D Active Metric-Semantic SLAM

    Full text link
    In this letter, we address the problem of exploration and metric-semantic mapping of multi-floor GPS-denied indoor environments using Size Weight and Power (SWaP) constrained aerial robots. Most previous work in exploration assumes that robot localization is solved. However, neglecting the state uncertainty of the agent can ultimately lead to cascading errors both in the resulting map and in the state of the agent itself. Furthermore, actions that reduce localization errors may be at direct odds with the exploration task. We propose a framework that balances the efficiency of exploration with actions that reduce the state uncertainty of the agent. In particular, our algorithmic approach for active metric-semantic SLAM is built upon sparse information abstracted from raw problem data, to make it suitable for SWaP-constrained robots. Furthermore, we integrate this framework within a fully autonomous aerial robotic system that achieves autonomous exploration in cluttered, 3D environments. From extensive real-world experiments, we showed that by including Semantic Loop Closure (SLC), we can reduce the robot pose estimation errors by over 90% in translation and approximately 75% in yaw, and the uncertainties in pose estimates and semantic maps by over 70% and 65%, respectively. Although discussed in the context of indoor multi-floor exploration, our system can be used for various other applications, such as infrastructure inspection and precision agriculture where reliable GPS data may not be available.Comment: Submitted to RA-L for revie

    Semantic Mechanical Search with Large Vision and Language Models

    Full text link
    Moving objects to find a fully-occluded target object, known as mechanical search, is a challenging problem in robotics. As objects are often organized semantically, we conjecture that semantic information about object relationships can facilitate mechanical search and reduce search time. Large pretrained vision and language models (VLMs and LLMs) have shown promise in generalizing to uncommon objects and previously unseen real-world environments. In this work, we propose a novel framework called Semantic Mechanical Search (SMS). SMS conducts scene understanding and generates a semantic occupancy distribution explicitly using LLMs. Compared to methods that rely on visual similarities offered by CLIP embeddings, SMS leverages the deep reasoning capabilities of LLMs. Unlike prior work that uses VLMs and LLMs as end-to-end planners, which may not integrate well with specialized geometric planners, SMS can serve as a plug-in semantic module for downstream manipulation or navigation policies. For mechanical search in closed-world settings such as shelves, we compare with a geometric-based planner and show that SMS improves mechanical search performance by 24% across the pharmacy, kitchen, and office domains in simulation and 47.1% in physical experiments. For open-world real environments, SMS can produce better semantic distributions compared to CLIP-based methods, with the potential to be integrated with downstream navigation policies to improve object navigation tasks. Code, data, videos, and the appendix are available: https://sites.google.com/view/semantic-mechanical-searc

    Enhanced Place Name Search Using Semantic Gazetteers

    Get PDF
    With the increased availability of geospatial data and efficient geo-referencing services, people are now more likely to engage in geospatial searches for information on the Web. Searching by address is supported by geocoding which converts an address to a geographic coordinate. Addresses are one form of geospatial referencing that are relatively well understood and easy for people to use, but place names are generally the most intuitive natural language expressions that people use for locations. This thesis presents an approach, for enhancing place name searches with a geo-ontology and a semantically enabled gazetteer. This approach investigates the extension of general spatial relationships to domain specific semantically rich concepts and spatial relationships. Hydrography is selected as the domain, and the thesis investigates the specification of semantic relationships between hydrographic features as functions of spatial relationships between their footprints. A Gazetteer Ontology (GazOntology) based on ISO Standards is developed to associate a feature with a Spatial Reference. The Spatial Reference can be a GeoIdentifier which is a text based representation of a feature usually a place name or zip code or the spatial reference can be a Geometry representation which is a spatial footprint of the feature. A Hydrological Features Ontology (HydroOntology) is developed to model canonical forms of hydrological features and their hydrological relationships. The classes modelled are endurant classes modelled in foundational ontologies such as DOLCE. Semantics of these relationships in a hydrological context are specified in a HydroOntology. The HydroOntology and GazOntology can be viewed as the semantic schema for the HydroGazetteer. The HydroGazetteer was developed as an RDF triplestore and populated with instances of named hydrographic features from the National Hydrography Dataset (NHD) for several watersheds in the state of Maine. In order to determine what instances of surface hydrology features participate in the specified semantic relationships, information was obtained through spatial analysis of the National Hydrography Dataset (NHD), the NHDPlus data set and the Geographic Names Information System (GNIS). The 9 intersection model between point, line, directed line, and region geometries which identifies sets of relationship between geometries independent of what these geometries represent in the world provided the basis for identifying semantic relationships between the canonical hydrographic feature types. The developed ontologies enable the HydroGazetteer to answer different categories of queries, namely place name queries involving the taxonomy of feature types, queries on relations between named places, and place name queries with reasoning. A simple user interface to select a hydrological relationship and a hydrological feature name was developed and the results are displayed on a USGS topographic base map. The approach demonstrates that spatial semantics can provide effective query disambiguation and more targeted spatial queries between named places based on relationships such as upstream, downstream, or flows through

    GEOINTERPRET: AN ONTOLOGICAL ENGINEERING METHODOLOGY FOR AUTOMATED INTERPRETATION OF GEOSPATIAL QUERIES

    Get PDF
    Despite advances in GIS technology, solving geospatial problems using current GIS platforms involves complex tasks requiring specialized skills and knowledge that are attainable through formal training and experience in implementing GIS projects. These requisite skills and knowledge include: understanding domain-specific geospatial problems; understanding GIS representation of real-world objects, concepts, and activities; knowing how to identify, locate, retrieve, and integrate geospatial data sets into GIS projects; knowing specific geoprocessing capabilities available on specific GIS platforms; and skills in utilizing geoprocessing tools in GIS with appropriate data sets to solve problems effectively and efficiently. Users interested in solving application-domain problems often lack such skills and knowledge and resort to GIS experts (this is especially true for applications dealing with diverse geospatial data sets and complex problems). Therefore, there is a gap between users' knowledge about geoprocessing and GIS tools and the GIS knowledge and skills needed to solve geospatial problems. To fill this gap, a new approach that automates the tasks involved in geospatial problem solving is needed. Of these tasks, the most important is geospatial query (usually expressed in application-specific concepts and terminologies) interpretation and mapping to geoprocessing operations implementable by GIS. The goal of this research is to develop an ontological engineering methodology, called GeoInterpret, to automate the task of geospatial query interpretation and mapping. This methodology encompasses: a conceptualization of geospatial queries; a multiple-ontology approach for representing knowledge needed to solve geospatial queries; a set of techniques for mapping elements between different ontologies; and a set of algorithms for geospatial query interpretation, mapping, and geoprocessing workflow composition. A proof of concept was developed to demonstrate the working of GeoInterpret

    Toward General-Purpose Robots via Foundation Models: A Survey and Meta-Analysis

    Full text link
    Building general-purpose robots that can operate seamlessly, in any environment, with any object, and utilizing various skills to complete diverse tasks has been a long-standing goal in Artificial Intelligence. Unfortunately, however, most existing robotic systems have been constrained - having been designed for specific tasks, trained on specific datasets, and deployed within specific environments. These systems usually require extensively-labeled data, rely on task-specific models, have numerous generalization issues when deployed in real-world scenarios, and struggle to remain robust to distribution shifts. Motivated by the impressive open-set performance and content generation capabilities of web-scale, large-capacity pre-trained models (i.e., foundation models) in research fields such as Natural Language Processing (NLP) and Computer Vision (CV), we devote this survey to exploring (i) how these existing foundation models from NLP and CV can be applied to the field of robotics, and also exploring (ii) what a robotics-specific foundation model would look like. We begin by providing an overview of what constitutes a conventional robotic system and the fundamental barriers to making it universally applicable. Next, we establish a taxonomy to discuss current work exploring ways to leverage existing foundation models for robotics and develop ones catered to robotics. Finally, we discuss key challenges and promising future directions in using foundation models for enabling general-purpose robotic systems. We encourage readers to view our living GitHub repository of resources, including papers reviewed in this survey as well as related projects and repositories for developing foundation models for robotics
    corecore