13,793 research outputs found
3D Scene Graph Prediction on Point Clouds Using Knowledge Graphs
3D scene graph prediction is a task that aims to concurrently predict object
classes and their relationships within a 3D environment. As these environments
are primarily designed by and for humans, incorporating commonsense knowledge
regarding objects and their relationships can significantly constrain and
enhance the prediction of the scene graph. In this paper, we investigate the
application of commonsense knowledge graphs for 3D scene graph prediction on
point clouds of indoor scenes. Through experiments conducted on a real-world
indoor dataset, we demonstrate that integrating external commonsense knowledge
via the message-passing method leads to a 15.0 % improvement in scene graph
prediction accuracy with external knowledge and with internal
knowledge when compared to state-of-the-art algorithms. We also tested in the
real world with 10 frames per second for scene graph generation to show the
usage of the model in a more realistic robotics setting.Comment: accepted at CASE 202
Combining ontologies and scene graphs to extract abstract actions
Make a machine to understand what is happening on a visual recording is one of the most ambitious goals pursued by the Computer Vision research field. This incredibly complex task requires the application of several different techniques, from the object detection, through the definition of the relationships that these objects can have in the scene, including the application of knowledge that allows adding different sets of relationships to compose abstract and compound actions (such as washing clothes or getting ready to go out on the
street).
In this context, Scene Graphs techniques have been proposed in literature. Their ap- proach is to capture the different relations that appear in a scene with the aim of aggregating them inside a graph which allow us to define a visual scene. Nowadays, the state-of-the-art
methods hardly rely on prior knowledge extracted from the training step, this knowledge is clearly biased into the training set. Because of that, Scene Graph Generation models have a hard time correctly defining relationships between previously unseen objects. In recent years, a branch of models has emerged that attempt to apply common-sense knowledge techniques to try to lower the dependency of Scene Graph Generation models on prior bias.
This project describes and tests the most recent Common-sense techniques applied to scene graph generation, and then proposes a new technique: Generalized Action Graphs (GAG). The work also implements a recently published metric that allows measuring the generalization of a Scene Graph Generation model
Open-Vocabulary Object Detection via Scene Graph Discovery
In recent years, open-vocabulary (OV) object detection has attracted
increasing research attention. Unlike traditional detection, which only
recognizes fixed-category objects, OV detection aims to detect objects in an
open category set. Previous works often leverage vision-language (VL) training
data (e.g., referring grounding data) to recognize OV objects. However, they
only use pairs of nouns and individual objects in VL data, while these data
usually contain much more information, such as scene graphs, which are also
crucial for OV detection. In this paper, we propose a novel Scene-Graph-Based
Discovery Network (SGDN) that exploits scene graph cues for OV detection.
Firstly, a scene-graph-based decoder (SGDecoder) including sparse
scene-graph-guided attention (SSGA) is presented. It captures scene graphs and
leverages them to discover OV objects. Secondly, we propose scene-graph-based
prediction (SGPred), where we build a scene-graph-based offset regression
(SGOR) mechanism to enable mutual enhancement between scene graph extraction
and object localization. Thirdly, we design a cross-modal learning mechanism in
SGPred. It takes scene graphs as bridges to improve the consistency between
cross-modal embeddings for OV object classification. Experiments on COCO and
LVIS demonstrate the effectiveness of our approach. Moreover, we show the
ability of our model for OV scene graph detection, while previous OV scene
graph generation methods cannot tackle this task
- …