Expanding Scene Graph Boundaries: Fully Open-vocabulary Scene Graph
  Generation via Visual-Concept Alignment and Retention

Chen, Changwen; Chen, Zuyao; Lei, Zhen; Wu, Jinlin; Zhang, Zhaoxiang

Expanding Scene Graph Boundaries: Fully Open-vocabulary Scene Graph Generation via Visual-Concept Alignment and Retention

Authors: Changwen Chen
Zuyao Chen
Zhen Lei
Jinlin Wu
Zhaoxiang Zhang
Publication date: 18 November 2023
Publisher

Abstract

Scene Graph Generation (SGG) offers a structured representation critical in many computer vision applications. Traditional SGG approaches, however, are limited by a closed-set assumption, restricting their ability to recognize only predefined object and relation categories. To overcome this, we categorize SGG scenarios into four distinct settings based on the node and edge: Closed-set SGG, Open Vocabulary (object) Detection-based SGG (OvD-SGG), Open Vocabulary Relation-based SGG (OvR-SGG), and Open Vocabulary Detection + Relation-based SGG (OvD+R-SGG). While object-centric open vocabulary SGG has been studied recently, the more challenging problem of relation-involved open-vocabulary SGG remains relatively unexplored. To fill this gap, we propose a unified framework named OvSGTR towards fully open vocabulary SGG from a holistic view. The proposed framework is an end-toend transformer architecture, which learns a visual-concept alignment for both nodes and edges, enabling the model to recognize unseen categories. For the more challenging settings of relation-involved open vocabulary SGG, the proposed approach integrates relation-aware pre-training utilizing image-caption data and retains visual-concept alignment through knowledge distillation. Comprehensive experimental results on the Visual Genome benchmark demonstrate the effectiveness and superiority of the proposed framework.Comment: 10 pages, 4 figures, 6 table

Similar works

Full text

Available Versions

arXiv.org e-Print Archive

oai:arXiv.org:2311.10988

Last time updated on 07/05/2024