Search CORE

2 research outputs found

Context-Aware Entity Grounding with Open-Vocabulary 3D Scene Graphs

Author: Abbas Adeeb
Bekris Kostas
Boularias Abdeslam
Boyalakuntla Kowndinya
Cai Siwei
Chang Haonan
Geng Shijie
Jing Eric
Keskar Shreesh
Lu Shiyang
Zhou Lifeng
Publication venue
Publication date: 27/09/2023
Field of study

We present an Open-Vocabulary 3D Scene Graph (OVSG), a formal framework for grounding a variety of entities, such as object instances, agents, and regions, with free-form text-based queries. Unlike conventional semantic-based object localization approaches, our system facilitates context-aware entity localization, allowing for queries such as ``pick up a cup on a kitchen table" or ``navigate to a sofa on which someone is sitting". In contrast to existing research on 3D scene graphs, OVSG supports free-form text input and open-vocabulary querying. Through a series of comparative experiments using the ScanNet dataset and a self-collected dataset, we demonstrate that our proposed approach significantly surpasses the performance of previous semantic-based localization techniques. Moreover, we highlight the practical application of OVSG in real-world robot navigation and manipulation experiments.Comment: The code and dataset used for evaluation can be found at https://github.com/changhaonan/OVSG}{https://github.com/changhaonan/OVSG. This paper has been accepted by CoRL202

arXiv.org e-Print Archive

Dataset for Open Vocabulary Entity Grounding (DOVE-G)

Author: Abdeslam Boularias (17145191)
Adeeb Abbas (17145190)
Eric Jing (17145180)
Haonan Chang (17145170)
Kostas Bekris (6983895)
Kowndinya Boyalakuntla (17144425)
Lifeng Zhou (17145186)
Shijie Geng (17145185)
Shiyang Lu (17145175)
Shreesh Keskar (17145181)
Siwei Cai (17145178)
Publication venue
Publication date: 13/10/2023
Field of study

DOVE-GTo accommodate the richness of open-vocabulary queries, we introduced a custom dataset—DOVE-G (Dataset for Open-Vocabulary Entity Grounding). This dataset has 8 scenes namely kitchen, kitchenette, room1, room2, room3, bathroom, computer lab, and hallway. This dataset is created to facilitate users to query for objects within a scene using natural language. For each scene within DOVE-G, we manually labeled the ground truth and created 50 natural language queries (Lq ). To augment this query set, we harnessed LLMs to generate four additional sets of natural language queries. This approach yielded a total of 250 queries for each scene, and cumulatively, we have 4000 queries to evaluate OVSG’s performance. With this setup, we set out to assess how our OVSG framework performs with open-vocabulary queries, one of our key research questions, providing a critical testbed for its effectiveness in handling diverse natural language expressions.</p

The Francis Crick Institute