1,171 research outputs found
Combining ontologies and scene graphs to extract abstract actions
Make a machine to understand what is happening on a visual recording is one of the most ambitious goals pursued by the Computer Vision research field. This incredibly complex task requires the application of several different techniques, from the object detection, through the definition of the relationships that these objects can have in the scene, including the application of knowledge that allows adding different sets of relationships to compose abstract and compound actions (such as washing clothes or getting ready to go out on the
street).
In this context, Scene Graphs techniques have been proposed in literature. Their ap- proach is to capture the different relations that appear in a scene with the aim of aggregating them inside a graph which allow us to define a visual scene. Nowadays, the state-of-the-art
methods hardly rely on prior knowledge extracted from the training step, this knowledge is clearly biased into the training set. Because of that, Scene Graph Generation models have a hard time correctly defining relationships between previously unseen objects. In recent years, a branch of models has emerged that attempt to apply common-sense knowledge techniques to try to lower the dependency of Scene Graph Generation models on prior bias.
This project describes and tests the most recent Common-sense techniques applied to scene graph generation, and then proposes a new technique: Generalized Action Graphs (GAG). The work also implements a recently published metric that allows measuring the generalization of a Scene Graph Generation model
NODIS: Neural Ordinary Differential Scene Understanding
Semantic image understanding is a challenging topic in computer vision. It
requires to detect all objects in an image, but also to identify all the
relations between them. Detected objects, their labels and the discovered
relations can be used to construct a scene graph which provides an abstract
semantic interpretation of an image. In previous works, relations were
identified by solving an assignment problem formulated as Mixed-Integer Linear
Programs. In this work, we interpret that formulation as Ordinary Differential
Equation (ODE). The proposed architecture performs scene graph inference by
solving a neural variant of an ODE by end-to-end learning. It achieves
state-of-the-art results on all three benchmark tasks: scene graph generation
(SGGen), classification (SGCls) and visual relationship detection (PredCls) on
Visual Genome benchmark
Learning Material-Aware Local Descriptors for 3D Shapes
Material understanding is critical for design, geometric modeling, and
analysis of functional objects. We enable material-aware 3D shape analysis by
employing a projective convolutional neural network architecture to learn
material- aware descriptors from view-based representations of 3D points for
point-wise material classification or material- aware retrieval. Unfortunately,
only a small fraction of shapes in 3D repositories are labeled with physical
mate- rials, posing a challenge for learning methods. To address this
challenge, we crowdsource a dataset of 3080 3D shapes with part-wise material
labels. We focus on furniture models which exhibit interesting structure and
material variabil- ity. In addition, we also contribute a high-quality expert-
labeled benchmark of 115 shapes from Herman-Miller and IKEA for evaluation. We
further apply a mesh-aware con- ditional random field, which incorporates
rotational and reflective symmetries, to smooth our local material predic-
tions across neighboring surface patches. We demonstrate the effectiveness of
our learned descriptors for automatic texturing, material-aware retrieval, and
physical simulation. The dataset and code will be publicly available.Comment: 3DV 201
- …