2,393 research outputs found
Learning a Hierarchical Latent-Variable Model of 3D Shapes
We propose the Variational Shape Learner (VSL), a generative model that
learns the underlying structure of voxelized 3D shapes in an unsupervised
fashion. Through the use of skip-connections, our model can successfully learn
and infer a latent, hierarchical representation of objects. Furthermore,
realistic 3D objects can be easily generated by sampling the VSL's latent
probabilistic manifold. We show that our generative model can be trained
end-to-end from 2D images to perform single image 3D model retrieval.
Experiments show, both quantitatively and qualitatively, the improved
generalization of our proposed model over a range of tasks, performing better
or comparable to various state-of-the-art alternatives.Comment: Accepted as oral presentation at International Conference on 3D
Vision (3DV), 201
Semi-supervised Deep Generative Modelling of Incomplete Multi-Modality Emotional Data
There are threefold challenges in emotion recognition. First, it is difficult
to recognize human's emotional states only considering a single modality.
Second, it is expensive to manually annotate the emotional data. Third,
emotional data often suffers from missing modalities due to unforeseeable
sensor malfunction or configuration issues. In this paper, we address all these
problems under a novel multi-view deep generative framework. Specifically, we
propose to model the statistical relationships of multi-modality emotional data
using multiple modality-specific generative networks with a shared latent
space. By imposing a Gaussian mixture assumption on the posterior approximation
of the shared latent variables, our framework can learn the joint deep
representation from multiple modalities and evaluate the importance of each
modality simultaneously. To solve the labeled-data-scarcity problem, we extend
our multi-view model to semi-supervised learning scenario by casting the
semi-supervised classification problem as a specialized missing data imputation
task. To address the missing-modality problem, we further extend our
semi-supervised multi-view model to deal with incomplete data, where a missing
view is treated as a latent variable and integrated out during inference. This
way, the proposed overall framework can utilize all available (both labeled and
unlabeled, as well as both complete and incomplete) data to improve its
generalization ability. The experiments conducted on two real multi-modal
emotion datasets demonstrated the superiority of our framework.Comment: arXiv admin note: text overlap with arXiv:1704.07548, 2018 ACM
Multimedia Conference (MM'18
Generating 3D faces using Convolutional Mesh Autoencoders
Learned 3D representations of human faces are useful for computer vision
problems such as 3D face tracking and reconstruction from images, as well as
graphics applications such as character generation and animation. Traditional
models learn a latent representation of a face using linear subspaces or
higher-order tensor generalizations. Due to this linearity, they can not
capture extreme deformations and non-linear expressions. To address this, we
introduce a versatile model that learns a non-linear representation of a face
using spectral convolutions on a mesh surface. We introduce mesh sampling
operations that enable a hierarchical mesh representation that captures
non-linear variations in shape and expression at multiple scales within the
model. In a variational setting, our model samples diverse realistic 3D faces
from a multivariate Gaussian distribution. Our training data consists of 20,466
meshes of extreme expressions captured over 12 different subjects. Despite
limited training data, our trained model outperforms state-of-the-art face
models with 50% lower reconstruction error, while using 75% fewer parameters.
We also show that, replacing the expression space of an existing
state-of-the-art face model with our autoencoder, achieves a lower
reconstruction error. Our data, model and code are available at
http://github.com/anuragranj/com
- …