473 research outputs found
Kimera: from SLAM to Spatial Perception with 3D Dynamic Scene Graphs
Humans are able to form a complex mental model of the environment they move
in. This mental model captures geometric and semantic aspects of the scene,
describes the environment at multiple levels of abstractions (e.g., objects,
rooms, buildings), includes static and dynamic entities and their relations
(e.g., a person is in a room at a given time). In contrast, current robots'
internal representations still provide a partial and fragmented understanding
of the environment, either in the form of a sparse or dense set of geometric
primitives (e.g., points, lines, planes, voxels) or as a collection of objects.
This paper attempts to reduce the gap between robot and human perception by
introducing a novel representation, a 3D Dynamic Scene Graph(DSG), that
seamlessly captures metric and semantic aspects of a dynamic environment. A DSG
is a layered graph where nodes represent spatial concepts at different levels
of abstraction, and edges represent spatio-temporal relations among nodes. Our
second contribution is Kimera, the first fully automatic method to build a DSG
from visual-inertial data. Kimera includes state-of-the-art techniques for
visual-inertial SLAM, metric-semantic 3D reconstruction, object localization,
human pose and shape estimation, and scene parsing. Our third contribution is a
comprehensive evaluation of Kimera in real-life datasets and photo-realistic
simulations, including a newly released dataset, uHumans2, which simulates a
collection of crowded indoor and outdoor scenes. Our evaluation shows that
Kimera achieves state-of-the-art performance in visual-inertial SLAM, estimates
an accurate 3D metric-semantic mesh model in real-time, and builds a DSG of a
complex indoor environment with tens of objects and humans in minutes. Our
final contribution shows how to use a DSG for real-time hierarchical semantic
path-planning. The core modules in Kimera are open-source.Comment: 34 pages, 25 figures, 9 tables. arXiv admin note: text overlap with
arXiv:2002.0628
3D Active Metric-Semantic SLAM
In this letter, we address the problem of exploration and metric-semantic
mapping of multi-floor GPS-denied indoor environments using Size Weight and
Power (SWaP) constrained aerial robots. Most previous work in exploration
assumes that robot localization is solved. However, neglecting the state
uncertainty of the agent can ultimately lead to cascading errors both in the
resulting map and in the state of the agent itself. Furthermore, actions that
reduce localization errors may be at direct odds with the exploration task. We
propose a framework that balances the efficiency of exploration with actions
that reduce the state uncertainty of the agent. In particular, our algorithmic
approach for active metric-semantic SLAM is built upon sparse information
abstracted from raw problem data, to make it suitable for SWaP-constrained
robots. Furthermore, we integrate this framework within a fully autonomous
aerial robotic system that achieves autonomous exploration in cluttered, 3D
environments. From extensive real-world experiments, we showed that by
including Semantic Loop Closure (SLC), we can reduce the robot pose estimation
errors by over 90% in translation and approximately 75% in yaw, and the
uncertainties in pose estimates and semantic maps by over 70% and 65%,
respectively. Although discussed in the context of indoor multi-floor
exploration, our system can be used for various other applications, such as
infrastructure inspection and precision agriculture where reliable GPS data may
not be available.Comment: Submitted to RA-L for revie
Semantic Mechanical Search with Large Vision and Language Models
Moving objects to find a fully-occluded target object, known as mechanical
search, is a challenging problem in robotics. As objects are often organized
semantically, we conjecture that semantic information about object
relationships can facilitate mechanical search and reduce search time. Large
pretrained vision and language models (VLMs and LLMs) have shown promise in
generalizing to uncommon objects and previously unseen real-world environments.
In this work, we propose a novel framework called Semantic Mechanical Search
(SMS). SMS conducts scene understanding and generates a semantic occupancy
distribution explicitly using LLMs. Compared to methods that rely on visual
similarities offered by CLIP embeddings, SMS leverages the deep reasoning
capabilities of LLMs. Unlike prior work that uses VLMs and LLMs as end-to-end
planners, which may not integrate well with specialized geometric planners, SMS
can serve as a plug-in semantic module for downstream manipulation or
navigation policies. For mechanical search in closed-world settings such as
shelves, we compare with a geometric-based planner and show that SMS improves
mechanical search performance by 24% across the pharmacy, kitchen, and office
domains in simulation and 47.1% in physical experiments. For open-world real
environments, SMS can produce better semantic distributions compared to
CLIP-based methods, with the potential to be integrated with downstream
navigation policies to improve object navigation tasks. Code, data, videos, and
the appendix are available:
https://sites.google.com/view/semantic-mechanical-searc
Enhanced Place Name Search Using Semantic Gazetteers
With the increased availability of geospatial data and efficient geo-referencing services, people are now more likely to engage in geospatial searches for information on the Web. Searching by address is supported by geocoding which converts an address to a geographic coordinate. Addresses are one form of geospatial referencing that are relatively well understood and easy for people to use, but place names are generally the most intuitive natural language expressions that people use for locations. This thesis presents an approach, for enhancing place name searches with a geo-ontology and a semantically enabled gazetteer. This approach investigates the extension of general spatial relationships to domain specific semantically rich concepts and spatial relationships. Hydrography is selected as the domain, and the thesis investigates the specification of semantic relationships between hydrographic features as functions of spatial relationships between their footprints.
A Gazetteer Ontology (GazOntology) based on ISO Standards is developed to associate a feature with a Spatial Reference. The Spatial Reference can be a GeoIdentifier which is a text based representation of a feature usually a place name or zip code or the spatial reference can be a Geometry representation which is a spatial footprint of the feature. A Hydrological Features Ontology (HydroOntology) is developed to model canonical forms of hydrological features and their hydrological relationships. The classes modelled are endurant classes modelled in foundational ontologies such as DOLCE. Semantics of these relationships in a hydrological context are specified in a HydroOntology.
The HydroOntology and GazOntology can be viewed as the semantic schema for the HydroGazetteer. The HydroGazetteer was developed as an RDF triplestore and populated with instances of named hydrographic features from the National Hydrography Dataset (NHD) for several watersheds in the state of Maine. In order to determine what instances of surface hydrology features participate in the specified semantic relationships, information was obtained through spatial analysis of the National Hydrography Dataset (NHD), the NHDPlus data set and the Geographic Names Information System (GNIS). The 9 intersection model between point, line, directed line, and region geometries which identifies sets of relationship between geometries independent of what these geometries represent in the world provided the basis for identifying semantic relationships between the canonical hydrographic feature types.
The developed ontologies enable the HydroGazetteer to answer different categories of queries, namely place name queries involving the taxonomy of feature types, queries on relations between named places, and place name queries with reasoning. A simple user interface to select a hydrological relationship and a hydrological feature name was developed and the results are displayed on a USGS topographic base map. The approach demonstrates that spatial semantics can provide effective query disambiguation and more targeted spatial queries between named places based on relationships such as upstream, downstream, or flows through
GEOINTERPRET: AN ONTOLOGICAL ENGINEERING METHODOLOGY FOR AUTOMATED INTERPRETATION OF GEOSPATIAL QUERIES
Despite advances in GIS technology, solving geospatial problems using current GIS platforms involves complex tasks requiring specialized skills and knowledge that are attainable through formal training and experience in implementing GIS projects. These requisite skills and knowledge include: understanding domain-specific geospatial problems; understanding GIS representation of real-world objects, concepts, and activities; knowing how to identify, locate, retrieve, and integrate geospatial data sets into GIS projects; knowing specific geoprocessing capabilities available on specific GIS platforms; and skills in utilizing geoprocessing tools in GIS with appropriate data sets to solve problems effectively and efficiently. Users interested in solving application-domain problems often lack such skills and knowledge and resort to GIS experts (this is especially true for applications dealing with diverse geospatial data sets and complex problems). Therefore, there is a gap between users' knowledge about geoprocessing and GIS tools and the GIS knowledge and skills needed to solve geospatial problems. To fill this gap, a new approach that automates the tasks involved in geospatial problem solving is needed. Of these tasks, the most important is geospatial query (usually expressed in application-specific concepts and terminologies) interpretation and mapping to geoprocessing operations implementable by GIS. The goal of this research is to develop an ontological engineering methodology, called GeoInterpret, to automate the task of geospatial query interpretation and mapping. This methodology encompasses: a conceptualization of geospatial queries; a multiple-ontology approach for representing knowledge needed to solve geospatial queries; a set of techniques for mapping elements between different ontologies; and a set of algorithms for geospatial query interpretation, mapping, and geoprocessing workflow composition. A proof of concept was developed to demonstrate the working of GeoInterpret
Toward General-Purpose Robots via Foundation Models: A Survey and Meta-Analysis
Building general-purpose robots that can operate seamlessly, in any
environment, with any object, and utilizing various skills to complete diverse
tasks has been a long-standing goal in Artificial Intelligence. Unfortunately,
however, most existing robotic systems have been constrained - having been
designed for specific tasks, trained on specific datasets, and deployed within
specific environments. These systems usually require extensively-labeled data,
rely on task-specific models, have numerous generalization issues when deployed
in real-world scenarios, and struggle to remain robust to distribution shifts.
Motivated by the impressive open-set performance and content generation
capabilities of web-scale, large-capacity pre-trained models (i.e., foundation
models) in research fields such as Natural Language Processing (NLP) and
Computer Vision (CV), we devote this survey to exploring (i) how these existing
foundation models from NLP and CV can be applied to the field of robotics, and
also exploring (ii) what a robotics-specific foundation model would look like.
We begin by providing an overview of what constitutes a conventional robotic
system and the fundamental barriers to making it universally applicable. Next,
we establish a taxonomy to discuss current work exploring ways to leverage
existing foundation models for robotics and develop ones catered to robotics.
Finally, we discuss key challenges and promising future directions in using
foundation models for enabling general-purpose robotic systems. We encourage
readers to view our living GitHub repository of resources, including papers
reviewed in this survey as well as related projects and repositories for
developing foundation models for robotics
- …