1,171 research outputs found

    Combining ontologies and scene graphs to extract abstract actions

    Get PDF
    Make a machine to understand what is happening on a visual recording is one of the most ambitious goals pursued by the Computer Vision research field. This incredibly complex task requires the application of several different techniques, from the object detection, through the definition of the relationships that these objects can have in the scene, including the application of knowledge that allows adding different sets of relationships to compose abstract and compound actions (such as washing clothes or getting ready to go out on the street). In this context, Scene Graphs techniques have been proposed in literature. Their ap- proach is to capture the different relations that appear in a scene with the aim of aggregating them inside a graph which allow us to define a visual scene. Nowadays, the state-of-the-art methods hardly rely on prior knowledge extracted from the training step, this knowledge is clearly biased into the training set. Because of that, Scene Graph Generation models have a hard time correctly defining relationships between previously unseen objects. In recent years, a branch of models has emerged that attempt to apply common-sense knowledge techniques to try to lower the dependency of Scene Graph Generation models on prior bias. This project describes and tests the most recent Common-sense techniques applied to scene graph generation, and then proposes a new technique: Generalized Action Graphs (GAG). The work also implements a recently published metric that allows measuring the generalization of a Scene Graph Generation model

    NODIS: Neural Ordinary Differential Scene Understanding

    Get PDF
    Semantic image understanding is a challenging topic in computer vision. It requires to detect all objects in an image, but also to identify all the relations between them. Detected objects, their labels and the discovered relations can be used to construct a scene graph which provides an abstract semantic interpretation of an image. In previous works, relations were identified by solving an assignment problem formulated as Mixed-Integer Linear Programs. In this work, we interpret that formulation as Ordinary Differential Equation (ODE). The proposed architecture performs scene graph inference by solving a neural variant of an ODE by end-to-end learning. It achieves state-of-the-art results on all three benchmark tasks: scene graph generation (SGGen), classification (SGCls) and visual relationship detection (PredCls) on Visual Genome benchmark

    Learning Material-Aware Local Descriptors for 3D Shapes

    Full text link
    Material understanding is critical for design, geometric modeling, and analysis of functional objects. We enable material-aware 3D shape analysis by employing a projective convolutional neural network architecture to learn material- aware descriptors from view-based representations of 3D points for point-wise material classification or material- aware retrieval. Unfortunately, only a small fraction of shapes in 3D repositories are labeled with physical mate- rials, posing a challenge for learning methods. To address this challenge, we crowdsource a dataset of 3080 3D shapes with part-wise material labels. We focus on furniture models which exhibit interesting structure and material variabil- ity. In addition, we also contribute a high-quality expert- labeled benchmark of 115 shapes from Herman-Miller and IKEA for evaluation. We further apply a mesh-aware con- ditional random field, which incorporates rotational and reflective symmetries, to smooth our local material predic- tions across neighboring surface patches. We demonstrate the effectiveness of our learned descriptors for automatic texturing, material-aware retrieval, and physical simulation. The dataset and code will be publicly available.Comment: 3DV 201
    corecore