97 research outputs found
Learning a Disentangled Embedding for Monocular 3D Shape Retrieval and Pose Estimation
We propose a novel approach to jointly perform 3D shape retrieval and pose
estimation from monocular images.In order to make the method robust to
real-world image variations, e.g. complex textures and backgrounds, we learn an
embedding space from 3D data that only includes the relevant information,
namely the shape and pose. Our approach explicitly disentangles a shape vector
and a pose vector, which alleviates both pose bias for 3D shape retrieval and
categorical bias for pose estimation. We then train a CNN to map the images to
this embedding space, and then retrieve the closest 3D shape from the database
and estimate the 6D pose of the object. Our method achieves 10.3 median error
for pose estimation and 0.592 top-1-accuracy for category agnostic 3D object
retrieval on the Pascal3D+ dataset, outperforming the previous state-of-the-art
methods on both tasks
Connecting Look and Feel: Associating the visual and tactile properties of physical materials
For machines to interact with the physical world, they must understand the
physical properties of objects and materials they encounter. We use fabrics as
an example of a deformable material with a rich set of mechanical properties. A
thin flexible fabric, when draped, tends to look different from a heavy stiff
fabric. It also feels different when touched. Using a collection of 118 fabric
sample, we captured color and depth images of draped fabrics along with tactile
data from a high resolution touch sensor. We then sought to associate the
information from vision and touch by jointly training CNNs across the three
modalities. Through the CNN, each input, regardless of the modality, generates
an embedding vector that records the fabric's physical property. By comparing
the embeddings, our system is able to look at a fabric image and predict how it
will feel, and vice versa. We also show that a system jointly trained on vision
and touch data can outperform a similar system trained only on visual data when
tested purely with visual inputs
Straight to Shapes: Real-time Detection of Encoded Shapes
Current object detection approaches predict bounding boxes, but these provide
little instance-specific information beyond location, scale and aspect ratio.
In this work, we propose to directly regress to objects' shapes in addition to
their bounding boxes and categories. It is crucial to find an appropriate shape
representation that is compact and decodable, and in which objects can be
compared for higher-order concepts such as view similarity, pose variation and
occlusion. To achieve this, we use a denoising convolutional auto-encoder to
establish an embedding space, and place the decoder after a fast end-to-end
network trained to regress directly to the encoded shape vectors. This yields
what to the best of our knowledge is the first real-time shape prediction
network, running at ~35 FPS on a high-end desktop. With higher-order shape
reasoning well-integrated into the network pipeline, the network shows the
useful practical quality of generalising to unseen categories similar to the
ones in the training set, something that most existing approaches fail to
handle.Comment: 16 pages including appendix; Published at CVPR 201
Variational Autoencoders for Deforming 3D Mesh Models
3D geometric contents are becoming increasingly popular. In this paper, we
study the problem of analyzing deforming 3D meshes using deep neural networks.
Deforming 3D meshes are flexible to represent 3D animation sequences as well as
collections of objects of the same category, allowing diverse shapes with
large-scale non-linear deformations. We propose a novel framework which we call
mesh variational autoencoders (mesh VAE), to explore the probabilistic latent
space of 3D surfaces. The framework is easy to train, and requires very few
training examples. We also propose an extended model which allows flexibly
adjusting the significance of different latent variables by altering the prior
distribution. Extensive experiments demonstrate that our general framework is
able to learn a reasonable representation for a collection of deformable
shapes, and produce competitive results for a variety of applications,
including shape generation, shape interpolation, shape space embedding and
shape exploration, outperforming state-of-the-art methods.Comment: CVPR 201
Joint Image and 3D Shape Part Representation in Large Collections for Object Blending
We propose a new approach to perform object shape retrieval from images, it can handle the
shape of the part of the object and combine parts from different sources to find a different 3D shape. Our
method creates a common representation for images and 3D models that enables mixing elements from
both kinds of inputs. Our approach automatically extracts the desired part and its 3D shape from each source
without the need of annotations. There are many applications to combining parts from images and 3D models,
for example, performing smart online catalogue searches by selecting the parts that we are looking for from
images or 3D models and retrieve a 3D shape that has the desired arrangement of parts. Our approach is
capable of obtaining the shape of the parts of an object from an image in the wild, independently of the pose
of the object and without the need of annotations of any kind
3D Shape Knowledge Graph for Cross-domain and Cross-modal 3D Shape Retrieval
With the development of 3D modeling and fabrication, 3D shape retrieval has
become a hot topic. In recent years, several strategies have been put forth to
address this retrieval issue. However, it is difficult for them to handle
cross-modal 3D shape retrieval because of the natural differences between
modalities. In this paper, we propose an innovative concept, namely, geometric
words, which is regarded as the basic element to represent any 3D or 2D entity
by combination, and assisted by which, we can simultaneously handle
cross-domain or cross-modal retrieval problems. First, to construct the
knowledge graph, we utilize the geometric word as the node, and then use the
category of the 3D shape as well as the attribute of the geometry to bridge the
nodes. Second, based on the knowledge graph, we provide a unique way for
learning each entity's embedding. Finally, we propose an effective similarity
measure to handle the cross-domain and cross-modal 3D shape retrieval.
Specifically, every 3D or 2D entity could locate its geometric terms in the 3D
knowledge graph, which serve as a link between cross-domain and cross-modal
data. Thus, our approach can achieve the cross-domain and cross-modal 3D shape
retrieval at the same time. We evaluated our proposed method on the ModelNet40
dataset and ShapeNetCore55 dataset for both the 3D shape retrieval task and
cross-domain 3D shape retrieval task. The classic cross-modal dataset (MI3DOR)
is utilized to evaluate cross-modal 3D shape retrieval. Experimental results
and comparisons with state-of-the-art methods illustrate the superiority of our
approach
3D Face Reconstruction by Learning from Synthetic Data
Fast and robust three-dimensional reconstruction of facial geometric
structure from a single image is a challenging task with numerous applications.
Here, we introduce a learning-based approach for reconstructing a
three-dimensional face from a single image. Recent face recovery methods rely
on accurate localization of key characteristic points. In contrast, the
proposed approach is based on a Convolutional-Neural-Network (CNN) which
extracts the face geometry directly from its image. Although such deep
architectures outperform other models in complex computer vision problems,
training them properly requires a large dataset of annotated examples. In the
case of three-dimensional faces, currently, there are no large volume data
sets, while acquiring such big-data is a tedious task. As an alternative, we
propose to generate random, yet nearly photo-realistic, facial images for which
the geometric form is known. The suggested model successfully recovers facial
shapes from real images, even for faces with extreme expressions and under
various lighting conditions.Comment: The first two authors contributed equally to this wor
- …