2 research outputs found
Context-Aware Entity Grounding with Open-Vocabulary 3D Scene Graphs
We present an Open-Vocabulary 3D Scene Graph (OVSG), a formal framework for
grounding a variety of entities, such as object instances, agents, and regions,
with free-form text-based queries. Unlike conventional semantic-based object
localization approaches, our system facilitates context-aware entity
localization, allowing for queries such as ``pick up a cup on a kitchen table"
or ``navigate to a sofa on which someone is sitting". In contrast to existing
research on 3D scene graphs, OVSG supports free-form text input and
open-vocabulary querying. Through a series of comparative experiments using the
ScanNet dataset and a self-collected dataset, we demonstrate that our proposed
approach significantly surpasses the performance of previous semantic-based
localization techniques. Moreover, we highlight the practical application of
OVSG in real-world robot navigation and manipulation experiments.Comment: The code and dataset used for evaluation can be found at
https://github.com/changhaonan/OVSG}{https://github.com/changhaonan/OVSG.
This paper has been accepted by CoRL202
<b>D</b>ataset for <b>O</b>pen <b>V</b>ocabulary <b>E</b>ntity <b>G</b>rounding (DOVE-G)
DOVE-GTo accommodate the richness of open-vocabulary queries, we introduced a custom dataset—DOVE-G (Dataset for Open-Vocabulary Entity Grounding). This dataset has 8 scenes namely kitchen, kitchenette, room1, room2, room3, bathroom, computer lab, and hallway. This dataset is created to facilitate users to query for objects within a scene using natural language. For each scene within DOVE-G, we manually labeled the ground truth and created 50 natural language queries (Lq ). To augment this query set, we harnessed LLMs to generate four additional sets of natural language queries. This approach yielded a total of 250 queries for each scene, and cumulatively, we have 4000 queries to evaluate OVSG’s performance. With this setup, we set out to assess how our OVSG framework performs with open-vocabulary queries, one of our key research questions, providing a critical testbed for its effectiveness in handling diverse natural language expressions.</p