2 research outputs found

    Context-Aware Entity Grounding with Open-Vocabulary 3D Scene Graphs

    Full text link
    We present an Open-Vocabulary 3D Scene Graph (OVSG), a formal framework for grounding a variety of entities, such as object instances, agents, and regions, with free-form text-based queries. Unlike conventional semantic-based object localization approaches, our system facilitates context-aware entity localization, allowing for queries such as ``pick up a cup on a kitchen table" or ``navigate to a sofa on which someone is sitting". In contrast to existing research on 3D scene graphs, OVSG supports free-form text input and open-vocabulary querying. Through a series of comparative experiments using the ScanNet dataset and a self-collected dataset, we demonstrate that our proposed approach significantly surpasses the performance of previous semantic-based localization techniques. Moreover, we highlight the practical application of OVSG in real-world robot navigation and manipulation experiments.Comment: The code and dataset used for evaluation can be found at https://github.com/changhaonan/OVSG}{https://github.com/changhaonan/OVSG. This paper has been accepted by CoRL202

    <b>D</b>ataset for <b>O</b>pen <b>V</b>ocabulary <b>E</b>ntity <b>G</b>rounding (DOVE-G)

    No full text
    DOVE-GTo accommodate the richness of open-vocabulary queries, we introduced a custom dataset—DOVE-G (Dataset for Open-Vocabulary Entity Grounding). This dataset has 8 scenes namely kitchen, kitchenette, room1, room2, room3, bathroom, computer lab, and hallway. This dataset is created to facilitate users to query for objects within a scene using natural language. For each scene within DOVE-G, we manually labeled the ground truth and created 50 natural language queries (Lq ). To augment this query set, we harnessed LLMs to generate four additional sets of natural language queries. This approach yielded a total of 250 queries for each scene, and cumulatively, we have 4000 queries to evaluate OVSG’s performance. With this setup, we set out to assess how our OVSG framework performs with open-vocabulary queries, one of our key research questions, providing a critical testbed for its effectiveness in handling diverse natural language expressions.</p
    corecore