1,688 research outputs found
Disentangling Features in 3D Face Shapes for Joint Face Reconstruction and Recognition
This paper proposes an encoder-decoder network to disentangle shape features
during 3D face reconstruction from single 2D images, such that the tasks of
reconstructing accurate 3D face shapes and learning discriminative shape
features for face recognition can be accomplished simultaneously. Unlike
existing 3D face reconstruction methods, our proposed method directly regresses
dense 3D face shapes from single 2D images, and tackles identity and residual
(i.e., non-identity) components in 3D face shapes explicitly and separately
based on a composite 3D face shape model with latent representations. We devise
a training process for the proposed network with a joint loss measuring both
face identification error and 3D face shape reconstruction error. To construct
training data we develop a method for fitting 3D morphable model (3DMM) to
multiple 2D images of a subject. Comprehensive experiments have been done on
MICC, BU3DFE, LFW and YTF databases. The results show that our method expands
the capacity of 3DMM for capturing discriminative shape features and facial
detail, and thus outperforms existing methods both in 3D face reconstruction
accuracy and in face recognition accuracy.Comment: CVPR 201
Deep generative-contrastive networks for facial expression recognition
As the expressive depth of an emotional face differs with individuals or
expressions, recognizing an expression using a single facial image at a moment
is difficult. A relative expression of a query face compared to a reference
face might alleviate this difficulty. In this paper, we propose to utilize
contrastive representation that embeds a distinctive expressive factor for a
discriminative purpose. The contrastive representation is calculated at the
embedding layer of deep networks by comparing a given (query) image with the
reference image. We attempt to utilize a generative reference image that is
estimated based on the given image. Consequently, we deploy deep neural
networks that embed a combination of a generative model, a contrastive model,
and a discriminative model with an end-to-end training manner. In our proposed
networks, we attempt to disentangle a facial expressive factor in two steps
including learning of a generator network and a contrastive encoder network. We
conducted extensive experiments on publicly available face expression databases
(CK+, MMI, Oulu-CASIA, and in-the-wild databases) that have been widely adopted
in the recent literatures. The proposed method outperforms the known
state-of-the art methods in terms of the recognition accuracy
Disentangled Human Body Embedding Based on Deep Hierarchical Neural Network
Human bodies exhibit various shapes for different identities or poses, but
the body shape has certain similarities in structure and thus can be embedded
in a low-dimensional space. This paper presents an autoencoder-like network
architecture to learn disentangled shape and pose embedding specifically for
the 3D human body. This is inspired by recent progress of deformation-based
latent representation learning. To improve the reconstruction accuracy, we
propose a hierarchical reconstruction pipeline for the disentangling process
and construct a large dataset of human body models with consistent connectivity
for the learning of the neural network. Our learned embedding can not only
achieve superior reconstruction accuracy but also provide great flexibility in
3D human body generation via interpolation, bilinear interpolation, and latent
space sampling. The results from extensive experiments demonstrate the
powerfulness of our learned 3D human body embedding in various applications.Comment: This manuscript is accepted for publication in the IEEE Transactions
on Visualization and Computer Graphics Journal (IEEE TVCG). The Code is
available at https://github.com/Juyong/DHNN_BodyRepresentatio
Joint Face Detection and Facial Motion Retargeting for Multiple Faces
Facial motion retargeting is an important problem in both computer graphics
and vision, which involves capturing the performance of a human face and
transferring it to another 3D character. Learning 3D morphable model (3DMM)
parameters from 2D face images using convolutional neural networks is common in
2D face alignment, 3D face reconstruction etc. However, existing methods either
require an additional face detection step before retargeting or use a cascade
of separate networks to perform detection followed by retargeting in a
sequence. In this paper, we present a single end-to-end network to jointly
predict the bounding box locations and 3DMM parameters for multiple faces.
First, we design a novel multitask learning framework that learns a
disentangled representation of 3DMM parameters for a single face. Then, we
leverage the trained single face model to generate ground truth 3DMM parameters
for multiple faces to train another network that performs joint face detection
and motion retargeting for images with multiple faces. Experimental results
show that our joint detection and retargeting network has high face detection
accuracy and is robust to extreme expressions and poses while being faster than
state-of-the-art methods.Comment: Accepted to CVPR 201
Talking Face Generation by Adversarially Disentangled Audio-Visual Representation
Talking face generation aims to synthesize a sequence of face images that
correspond to a clip of speech. This is a challenging task because face
appearance variation and semantics of speech are coupled together in the subtle
movements of the talking face regions. Existing works either construct specific
face appearance model on specific subjects or model the transformation between
lip motion and speech. In this work, we integrate both aspects and enable
arbitrary-subject talking face generation by learning disentangled audio-visual
representation. We find that the talking face sequence is actually a
composition of both subject-related information and speech-related information.
These two spaces are then explicitly disentangled through a novel
associative-and-adversarial training process. This disentangled representation
has an advantage where both audio and video can serve as inputs for generation.
Extensive experiments show that the proposed approach generates realistic
talking face sequences on arbitrary subjects with much clearer lip motion
patterns than previous work. We also demonstrate the learned audio-visual
representation is extremely useful for the tasks of automatic lip reading and
audio-video retrieval.Comment: AAAI Conference on Artificial Intelligence (AAAI 2019) Oral
Presentation. Code, models, and video results are available on our webpage:
https://liuziwei7.github.io/projects/TalkingFace.htm
Deep Structure for end-to-end inverse rendering
Inverse rendering in a 3D format denoted to recovering the 3D properties of a
scene given 2D input image(s) and is typically done using 3D Morphable Model
(3DMM) based methods from single view images. These models formulate each face
as a weighted combination of some basis vectors extracted from the training
data. In this paper a deep framework is proposed in which the coefficients and
basis vectors are computed by training an autoencoder network and a
Convolutional Neural Network (CNN) simultaneously. The idea is to find a common
cause which can be mapped to both the 3D structure and corresponding 2D image
using deep networks. The empirical results verify the power of deep framework
in finding accurate 3D shapes of human faces from their corresponding 2D images
on synthetic datasets of human faces
Variational Inference of Disentangled Latent Concepts from Unlabeled Observations
Disentangled representations, where the higher level data generative factors
are reflected in disjoint latent dimensions, offer several benefits such as
ease of deriving invariant representations, transferability to other tasks,
interpretability, etc. We consider the problem of unsupervised learning of
disentangled representations from large pool of unlabeled observations, and
propose a variational inference based approach to infer disentangled latent
factors. We introduce a regularizer on the expectation of the approximate
posterior over observed data that encourages the disentanglement. We also
propose a new disentanglement metric which is better aligned with the
qualitative disentanglement observed in the decoder's output. We empirically
observe significant improvement over existing methods in terms of both
disentanglement and data likelihood (reconstruction quality).Comment: ICLR 2018 Versio
Accurate 3D Face Reconstruction with Weakly-Supervised Learning: From Single Image to Image Set
Recently, deep learning based 3D face reconstruction methods have shown
promising results in both quality and efficiency.However, training deep neural
networks typically requires a large volume of data, whereas face images with
ground-truth 3D face shapes are scarce. In this paper, we propose a novel deep
3D face reconstruction approach that 1) leverages a robust, hybrid loss
function for weakly-supervised learning which takes into account both low-level
and perception-level information for supervision, and 2) performs multi-image
face reconstruction by exploiting complementary information from different
images for shape aggregation. Our method is fast, accurate, and robust to
occlusion and large pose. We provide comprehensive experiments on three
datasets, systematically comparing our method with fifteen recent methods and
demonstrating its state-of-the-art performance.Comment: minor revision of the layout; update contact informatio
Disentangling Content and Style via Unsupervised Geometry Distillation
It is challenging to disentangle an object into two orthogonal spaces of
content and style since each can influence the visual observation differently
and unpredictably. It is rare for one to have access to a large number of data
to help separate the influences. In this paper, we present a novel framework to
learn this disentangled representation in a completely unsupervised manner. We
address this problem in a two-branch Autoencoder framework. For the structural
content branch, we project the latent factor into a soft structured point
tensor and constrain it with losses derived from prior knowledge. This
constraint encourages the branch to distill geometry information. Another
branch learns the complementary style information. The two branches form an
effective framework that can disentangle object's content-style representation
without any human annotation. We evaluate our approach on four image datasets,
on which we demonstrate the superior disentanglement and visual analogy quality
both in synthesized and real-world data. We are able to generate
photo-realistic images with 256*256 resolution that are clearly disentangled in
content and style.Comment: Accepted to ICLR 2019 Worksho
Unsupervised Part-Based Disentangling of Object Shape and Appearance
Large intra-class variation is the result of changes in multiple object
characteristics. Images, however, only show the superposition of different
variable factors such as appearance or shape. Therefore, learning to
disentangle and represent these different characteristics poses a great
challenge, especially in the unsupervised case. Moreover, large object
articulation calls for a flexible part-based model. We present an unsupervised
approach for disentangling appearance and shape by learning parts consistently
over all instances of a category. Our model for learning an object
representation is trained by simultaneously exploiting invariance and
equivariance constraints between synthetically transformed images. Since no
part annotation or prior information on an object class is required, the
approach is applicable to arbitrary classes. We evaluate our approach on a wide
range of object categories and diverse tasks including pose prediction,
disentangled image synthesis, and video-to-video translation. The approach
outperforms the state-of-the-art on unsupervised keypoint prediction and
compares favorably even against supervised approaches on the task of shape and
appearance transfer.Comment: CVPR 2019 Ora
- …