44,097 research outputs found
MVF-Net: Multi-View 3D Face Morphable Model Regression
We address the problem of recovering the 3D geometry of a human face from a
set of facial images in multiple views. While recent studies have shown
impressive progress in 3D Morphable Model (3DMM) based facial reconstruction,
the settings are mostly restricted to a single view. There is an inherent
drawback in the single-view setting: the lack of reliable 3D constraints can
cause unresolvable ambiguities. We in this paper explore 3DMM-based shape
recovery in a different setting, where a set of multi-view facial images are
given as input. A novel approach is proposed to regress 3DMM parameters from
multi-view inputs with an end-to-end trainable Convolutional Neural Network
(CNN). Multiview geometric constraints are incorporated into the network by
establishing dense correspondences between different views leveraging a novel
self-supervised view alignment loss. The main ingredient of the view alignment
loss is a differentiable dense optical flow estimator that can backpropagate
the alignment errors between an input view and a synthetic rendering from
another input view, which is projected to the target view through the 3D shape
to be inferred. Through minimizing the view alignment loss, better 3D shapes
can be recovered such that the synthetic projections from one view to another
can better align with the observed image. Extensive experiments demonstrate the
superiority of the proposed method over other 3DMM methods.Comment: 2019 Conference on Computer Vision and Pattern Recognitio
CNN-based Real-time Dense Face Reconstruction with Inverse-rendered Photo-realistic Face Images
With the powerfulness of convolution neural networks (CNN), CNN based face
reconstruction has recently shown promising performance in reconstructing
detailed face shape from 2D face images. The success of CNN-based methods
relies on a large number of labeled data. The state-of-the-art synthesizes such
data using a coarse morphable face model, which however has difficulty to
generate detailed photo-realistic images of faces (with wrinkles). This paper
presents a novel face data generation method. Specifically, we render a large
number of photo-realistic face images with different attributes based on
inverse rendering. Furthermore, we construct a fine-detailed face image dataset
by transferring different scales of details from one image to another. We also
construct a large number of video-type adjacent frame pairs by simulating the
distribution of real video data. With these nicely constructed datasets, we
propose a coarse-to-fine learning framework consisting of three convolutional
networks. The networks are trained for real-time detailed 3D face
reconstruction from monocular video as well as from a single image. Extensive
experimental results demonstrate that our framework can produce high-quality
reconstruction but with much less computation time compared to the
state-of-the-art. Moreover, our method is robust to pose, expression and
lighting due to the diversity of data.Comment: Accepted by IEEE Transactions on Pattern Analysis and Machine
Intelligence, 201
Emotional Qualities of VR Space
The emotional response a person has to a living space is predominantly
affected by light, color and texture as space-making elements. In order to
verify whether this phenomenon could be replicated in a simulated environment,
we conducted a user study in a six-sided projected immersive display that
utilized equivalent design attributes of brightness, color and texture in order
to assess to which extent the emotional response in a simulated environment is
affected by the same parameters affecting real environments. Since emotional
response depends upon the context, we evaluated the emotional responses of two
groups of users: inactive (passive) and active (performing a typical daily
activity). The results from the perceptual study generated data from which
design principles for a virtual living space are articulated. Such a space, as
an alternative to expensive built dwellings, could potentially support new,
minimalist lifestyles of occupants, defined as the neo-nomads, aligned with
their work experience in the digital domain through the generation of emotional
experiences of spaces. Data from the experiments confirmed the hypothesis that
perceivable emotional aspects of real-world spaces could be successfully
generated through simulation of design attributes in the virtual space. The
subjective response to the virtual space was consistent with corresponding
responses from real-world color and brightness emotional perception. Our data
could serve the virtual reality (VR) community in its attempt to conceive of
further applications of virtual spaces for well-defined activities.Comment: 12 figure
Cross-View Image Synthesis using Conditional GANs
Learning to generate natural scenes has always been a challenging task in
computer vision. It is even more painstaking when the generation is conditioned
on images with drastically different views. This is mainly because
understanding, corresponding, and transforming appearance and semantic
information across the views is not trivial. In this paper, we attempt to solve
the novel problem of cross-view image synthesis, aerial to street-view and vice
versa, using conditional generative adversarial networks (cGAN). Two new
architectures called Crossview Fork (X-Fork) and Crossview Sequential (X-Seq)
are proposed to generate scenes with resolutions of 64x64 and 256x256 pixels.
X-Fork architecture has a single discriminator and a single generator. The
generator hallucinates both the image and its semantic segmentation in the
target view. X-Seq architecture utilizes two cGANs. The first one generates the
target image which is subsequently fed to the second cGAN for generating its
corresponding semantic segmentation map. The feedback from the second cGAN
helps the first cGAN generate sharper images. Both of our proposed
architectures learn to generate natural images as well as their semantic
segmentation maps. The proposed methods show that they are able to capture and
maintain the true semantics of objects in source and target views better than
the traditional image-to-image translation method which considers only the
visual appearance of the scene. Extensive qualitative and quantitative
evaluations support the effectiveness of our frameworks, compared to two state
of the art methods, for natural scene generation across drastically different
views.Comment: Accepted at CVPR 201
- …