9,927 research outputs found
Scene Graph Generation by Iterative Message Passing
Understanding a visual scene goes beyond recognizing individual objects in
isolation. Relationships between objects also constitute rich semantic
information about the scene. In this work, we explicitly model the objects and
their relationships using scene graphs, a visually-grounded graphical structure
of an image. We propose a novel end-to-end model that generates such structured
scene representation from an input image. The model solves the scene graph
inference problem using standard RNNs and learns to iteratively improves its
predictions via message passing. Our joint inference model can take advantage
of contextual cues to make better predictions on objects and their
relationships. The experiments show that our model significantly outperforms
previous methods for generating scene graphs using Visual Genome dataset and
inferring support relations with NYU Depth v2 dataset.Comment: CVPR 201
Weakly Supervised Visual Semantic Parsing
Scene Graph Generation (SGG) aims to extract entities, predicates and their
semantic structure from images, enabling deep understanding of visual content,
with many applications such as visual reasoning and image retrieval.
Nevertheless, existing SGG methods require millions of manually annotated
bounding boxes for training, and are computationally inefficient, as they
exhaustively process all pairs of object proposals to detect predicates. In
this paper, we address those two limitations by first proposing a generalized
formulation of SGG, namely Visual Semantic Parsing, which disentangles entity
and predicate recognition, and enables sub-quadratic performance. Then we
propose the Visual Semantic Parsing Network, VSPNet, based on a dynamic,
attention-based, bipartite message passing framework that jointly infers graph
nodes and edges through an iterative process. Additionally, we propose the
first graph-based weakly supervised learning framework, based on a novel graph
alignment algorithm, which enables training without bounding box annotations.
Through extensive experiments, we show that VSPNet outperforms weakly
supervised baselines significantly and approaches fully supervised performance,
while being several times faster. We publicly release the source code of our
method.Comment: To be presented at CVPR 2020 (oral paper
Target-Tailored Source-Transformation for Scene Graph Generation
Scene graph generation aims to provide a semantic and structural description
of an image, denoting the objects (with nodes) and their relationships (with
edges). The best performing works to date are based on exploiting the context
surrounding objects or relations,e.g., by passing information among objects. In
these approaches, to transform the representation of source objects is a
critical process for extracting information for the use by target objects. In
this work, we argue that a source object should give what tar-get object needs
and give different objects different information rather than contributing
common information to all targets. To achieve this goal, we propose a
Target-TailoredSource-Transformation (TTST) method to efficiently propagate
information among object proposals and relations. Particularly, for a source
object proposal which will contribute information to other target objects, we
transform the source object feature to the target object feature domain by
simultaneously taking both the source and target into account. We further
explore more powerful representations by integrating language prior with the
visual context in the transformation for the scene graph generation. By doing
so the target object is able to extract target-specific information from the
source object and source relation accordingly to refine its representation. Our
framework is validated on the Visual Genome bench-mark and demonstrated its
state-of-the-art performance for the scene graph generation. The experimental
results show that the performance of object detection and visual relation-ship
detection are promoted mutually by our method
- …