Scene Graph Generation (SGG) remains a challenging visual understanding task
due to its compositional property. Most previous works adopt a bottom-up,
two-stage or point-based, one-stage approach, which often suffers from high
time complexity or suboptimal designs. In this work, we propose a novel SGG
method to address the aforementioned issues, formulating the task as a
bipartite graph construction problem. To address the issues above, we create a
transformer-based end-to-end framework to generate the entity and entity-aware
predicate proposal set, and infer directed edges to form relation triplets.
Moreover, we design a graph assembling module to infer the connectivity of the
bipartite scene graph based on our entity-aware structure, enabling us to
generate the scene graph in an end-to-end manner. Based on bipartite graph
assembling paradigm, we further propose a new technical design to address the
efficacy of entity-aware modeling and optimization stability of graph
assembling. Equipped with the enhanced entity-aware design, our method achieves
optimal performance and time-complexity. Extensive experimental results show
that our design is able to achieve the state-of-the-art or comparable
performance on three challenging benchmarks, surpassing most of the existing
approaches and enjoying higher efficiency in inference. Code is available:
https://github.com/Scarecrow0/SGTRComment: Accepted by TPAMI: https://ieeexplore.ieee.org/document/1031523