34,553 research outputs found
Style Separation and Synthesis via Generative Adversarial Networks
Style synthesis attracts great interests recently, while few works focus on
its dual problem "style separation". In this paper, we propose the Style
Separation and Synthesis Generative Adversarial Network (S3-GAN) to
simultaneously implement style separation and style synthesis on object
photographs of specific categories. Based on the assumption that the object
photographs lie on a manifold, and the contents and styles are independent, we
employ S3-GAN to build mappings between the manifold and a latent vector space
for separating and synthesizing the contents and styles. The S3-GAN consists of
an encoder network, a generator network, and an adversarial network. The
encoder network performs style separation by mapping an object photograph to a
latent vector. Two halves of the latent vector represent the content and style,
respectively. The generator network performs style synthesis by taking a
concatenated vector as input. The concatenated vector contains the style half
vector of the style target image and the content half vector of the content
target image. Once obtaining the images from the generator network, an
adversarial network is imposed to generate more photo-realistic images.
Experiments on CelebA and UT Zappos 50K datasets demonstrate that the S3-GAN
has the capacity of style separation and synthesis simultaneously, and could
capture various styles in a single model
Dynamic Facial Expression Generation on Hilbert Hypersphere with Conditional Wasserstein Generative Adversarial Nets
In this work, we propose a novel approach for generating videos of the six
basic facial expressions given a neutral face image. We propose to exploit the
face geometry by modeling the facial landmarks motion as curves encoded as
points on a hypersphere. By proposing a conditional version of manifold-valued
Wasserstein generative adversarial network (GAN) for motion generation on the
hypersphere, we learn the distribution of facial expression dynamics of
different classes, from which we synthesize new facial expression motions. The
resulting motions can be transformed to sequences of landmarks and then to
images sequences by editing the texture information using another conditional
Generative Adversarial Network. To the best of our knowledge, this is the first
work that explores manifold-valued representations with GAN to address the
problem of dynamic facial expression generation. We evaluate our proposed
approach both quantitatively and qualitatively on two public datasets;
Oulu-CASIA and MUG Facial Expression. Our experimental results demonstrate the
effectiveness of our approach in generating realistic videos with continuous
motion, realistic appearance and identity preservation. We also show the
efficiency of our framework for dynamic facial expressions generation, dynamic
facial expression transfer and data augmentation for training improved emotion
recognition models
GRASS: Generative Recursive Autoencoders for Shape Structures
We introduce a novel neural network architecture for encoding and synthesis
of 3D shapes, particularly their structures. Our key insight is that 3D shapes
are effectively characterized by their hierarchical organization of parts,
which reflects fundamental intra-shape relationships such as adjacency and
symmetry. We develop a recursive neural net (RvNN) based autoencoder to map a
flat, unlabeled, arbitrary part layout to a compact code. The code effectively
captures hierarchical structures of man-made 3D objects of varying structural
complexities despite being fixed-dimensional: an associated decoder maps a code
back to a full hierarchy. The learned bidirectional mapping is further tuned
using an adversarial setup to yield a generative model of plausible structures,
from which novel structures can be sampled. Finally, our structure synthesis
framework is augmented by a second trained module that produces fine-grained
part geometry, conditioned on global and local structural context, leading to a
full generative pipeline for 3D shapes. We demonstrate that without
supervision, our network learns meaningful structural hierarchies adhering to
perceptual grouping principles, produces compact codes which enable
applications such as shape classification and partial matching, and supports
shape synthesis and interpolation with significant variations in topology and
geometry.Comment: Corresponding author: Kai Xu ([email protected]
- …