97 research outputs found

    Learning a Disentangled Embedding for Monocular 3D Shape Retrieval and Pose Estimation

    Full text link
    We propose a novel approach to jointly perform 3D shape retrieval and pose estimation from monocular images.In order to make the method robust to real-world image variations, e.g. complex textures and backgrounds, we learn an embedding space from 3D data that only includes the relevant information, namely the shape and pose. Our approach explicitly disentangles a shape vector and a pose vector, which alleviates both pose bias for 3D shape retrieval and categorical bias for pose estimation. We then train a CNN to map the images to this embedding space, and then retrieve the closest 3D shape from the database and estimate the 6D pose of the object. Our method achieves 10.3 median error for pose estimation and 0.592 top-1-accuracy for category agnostic 3D object retrieval on the Pascal3D+ dataset, outperforming the previous state-of-the-art methods on both tasks

    Connecting Look and Feel: Associating the visual and tactile properties of physical materials

    Full text link
    For machines to interact with the physical world, they must understand the physical properties of objects and materials they encounter. We use fabrics as an example of a deformable material with a rich set of mechanical properties. A thin flexible fabric, when draped, tends to look different from a heavy stiff fabric. It also feels different when touched. Using a collection of 118 fabric sample, we captured color and depth images of draped fabrics along with tactile data from a high resolution touch sensor. We then sought to associate the information from vision and touch by jointly training CNNs across the three modalities. Through the CNN, each input, regardless of the modality, generates an embedding vector that records the fabric's physical property. By comparing the embeddings, our system is able to look at a fabric image and predict how it will feel, and vice versa. We also show that a system jointly trained on vision and touch data can outperform a similar system trained only on visual data when tested purely with visual inputs

    Straight to Shapes: Real-time Detection of Encoded Shapes

    Full text link
    Current object detection approaches predict bounding boxes, but these provide little instance-specific information beyond location, scale and aspect ratio. In this work, we propose to directly regress to objects' shapes in addition to their bounding boxes and categories. It is crucial to find an appropriate shape representation that is compact and decodable, and in which objects can be compared for higher-order concepts such as view similarity, pose variation and occlusion. To achieve this, we use a denoising convolutional auto-encoder to establish an embedding space, and place the decoder after a fast end-to-end network trained to regress directly to the encoded shape vectors. This yields what to the best of our knowledge is the first real-time shape prediction network, running at ~35 FPS on a high-end desktop. With higher-order shape reasoning well-integrated into the network pipeline, the network shows the useful practical quality of generalising to unseen categories similar to the ones in the training set, something that most existing approaches fail to handle.Comment: 16 pages including appendix; Published at CVPR 201

    Variational Autoencoders for Deforming 3D Mesh Models

    Full text link
    3D geometric contents are becoming increasingly popular. In this paper, we study the problem of analyzing deforming 3D meshes using deep neural networks. Deforming 3D meshes are flexible to represent 3D animation sequences as well as collections of objects of the same category, allowing diverse shapes with large-scale non-linear deformations. We propose a novel framework which we call mesh variational autoencoders (mesh VAE), to explore the probabilistic latent space of 3D surfaces. The framework is easy to train, and requires very few training examples. We also propose an extended model which allows flexibly adjusting the significance of different latent variables by altering the prior distribution. Extensive experiments demonstrate that our general framework is able to learn a reasonable representation for a collection of deformable shapes, and produce competitive results for a variety of applications, including shape generation, shape interpolation, shape space embedding and shape exploration, outperforming state-of-the-art methods.Comment: CVPR 201

    Joint Image and 3D Shape Part Representation in Large Collections for Object Blending

    Get PDF
    We propose a new approach to perform object shape retrieval from images, it can handle the shape of the part of the object and combine parts from different sources to find a different 3D shape. Our method creates a common representation for images and 3D models that enables mixing elements from both kinds of inputs. Our approach automatically extracts the desired part and its 3D shape from each source without the need of annotations. There are many applications to combining parts from images and 3D models, for example, performing smart online catalogue searches by selecting the parts that we are looking for from images or 3D models and retrieve a 3D shape that has the desired arrangement of parts. Our approach is capable of obtaining the shape of the parts of an object from an image in the wild, independently of the pose of the object and without the need of annotations of any kind

    3D Shape Knowledge Graph for Cross-domain and Cross-modal 3D Shape Retrieval

    Full text link
    With the development of 3D modeling and fabrication, 3D shape retrieval has become a hot topic. In recent years, several strategies have been put forth to address this retrieval issue. However, it is difficult for them to handle cross-modal 3D shape retrieval because of the natural differences between modalities. In this paper, we propose an innovative concept, namely, geometric words, which is regarded as the basic element to represent any 3D or 2D entity by combination, and assisted by which, we can simultaneously handle cross-domain or cross-modal retrieval problems. First, to construct the knowledge graph, we utilize the geometric word as the node, and then use the category of the 3D shape as well as the attribute of the geometry to bridge the nodes. Second, based on the knowledge graph, we provide a unique way for learning each entity's embedding. Finally, we propose an effective similarity measure to handle the cross-domain and cross-modal 3D shape retrieval. Specifically, every 3D or 2D entity could locate its geometric terms in the 3D knowledge graph, which serve as a link between cross-domain and cross-modal data. Thus, our approach can achieve the cross-domain and cross-modal 3D shape retrieval at the same time. We evaluated our proposed method on the ModelNet40 dataset and ShapeNetCore55 dataset for both the 3D shape retrieval task and cross-domain 3D shape retrieval task. The classic cross-modal dataset (MI3DOR) is utilized to evaluate cross-modal 3D shape retrieval. Experimental results and comparisons with state-of-the-art methods illustrate the superiority of our approach

    3D Face Reconstruction by Learning from Synthetic Data

    Full text link
    Fast and robust three-dimensional reconstruction of facial geometric structure from a single image is a challenging task with numerous applications. Here, we introduce a learning-based approach for reconstructing a three-dimensional face from a single image. Recent face recovery methods rely on accurate localization of key characteristic points. In contrast, the proposed approach is based on a Convolutional-Neural-Network (CNN) which extracts the face geometry directly from its image. Although such deep architectures outperform other models in complex computer vision problems, training them properly requires a large dataset of annotated examples. In the case of three-dimensional faces, currently, there are no large volume data sets, while acquiring such big-data is a tedious task. As an alternative, we propose to generate random, yet nearly photo-realistic, facial images for which the geometric form is known. The suggested model successfully recovers facial shapes from real images, even for faces with extreme expressions and under various lighting conditions.Comment: The first two authors contributed equally to this wor
    • …
    corecore