642 research outputs found
Recovering Faces from Portraits with Auxiliary Facial Attributes
Recovering a photorealistic face from an artistic portrait is a challenging
task since crucial facial details are often distorted or completely lost in
artistic compositions. To handle this loss, we propose an Attribute-guided Face
Recovery from Portraits (AFRP) that utilizes a Face Recovery Network (FRN) and
a Discriminative Network (DN). FRN consists of an autoencoder with residual
block-embedded skip-connections and incorporates facial attribute vectors into
the feature maps of input portraits at the bottleneck of the autoencoder. DN
has multiple convolutional and fully-connected layers, and its role is to
enforce FRN to generate authentic face images with corresponding facial
attributes dictated by the input attribute vectors. %Leveraging on the spatial
transformer networks, FRN automatically compensates for misalignments of
portraits. % and generates aligned face images. For the preservation of
identities, we impose the recovered and ground-truth faces to share similar
visual features. Specifically, DN determines whether the recovered image looks
like a real face and checks if the facial attributes extracted from the
recovered image are consistent with given attributes. %Our method can recover
high-quality photorealistic faces from unaligned portraits while preserving the
identity of the face images as well as it can reconstruct a photorealistic face
image with a desired set of attributes. Our method can recover photorealistic
identity-preserving faces with desired attributes from unseen stylized
portraits, artistic paintings, and hand-drawn sketches. On large-scale
synthesized and sketch datasets, we demonstrate that our face recovery method
achieves state-of-the-art results.Comment: 2019 IEEE Winter Conference on Applications of Computer Vision (WACV
Semi-supervised FusedGAN for Conditional Image Generation
We present FusedGAN, a deep network for conditional image synthesis with
controllable sampling of diverse images. Fidelity, diversity and controllable
sampling are the main quality measures of a good image generation model. Most
existing models are insufficient in all three aspects. The FusedGAN can perform
controllable sampling of diverse images with very high fidelity. We argue that
controllability can be achieved by disentangling the generation process into
various stages. In contrast to stacked GANs, where multiple stages of GANs are
trained separately with full supervision of labeled intermediate images, the
FusedGAN has a single stage pipeline with a built-in stacking of GANs. Unlike
existing methods, which requires full supervision with paired conditions and
images, the FusedGAN can effectively leverage more abundant images without
corresponding conditions in training, to produce more diverse samples with high
fidelity. We achieve this by fusing two generators: one for unconditional image
generation, and the other for conditional image generation, where the two
partly share a common latent space thereby disentangling the generation. We
demonstrate the efficacy of the FusedGAN in fine grained image generation tasks
such as text-to-image, and attribute-to-face generation
Improved Sketch-to-Photo Generation Using Filter Aided Generative Adversarial Network
Generating a photographic face image from given input sketch is most challenging task in computer vision. Mainly the sketches drawn by sketch artist used in human identification. Sketch to photo synthesis is very important applications in law enforcement as well as character design, educational training. In recent years Generative Adversarial Network (GAN) shows excellent performance on sketch to photo synthesis problem. Quality of hand drawn sketches affects the quality generated photo. It might be possible that while handling the hand drawn sketches, accidently by touching the user hand on pencil sketch or similar activities causes noise in given sketch. Likewise different styles like shading, darkness of pencil used by sketch artist may cause unnecessary noise in sketches. In recent year many sketches to photo synthesis methods are proposed, but they are mainly focused on network architecture to get better performance. In this paper we proposed Filter-aided GAN framework to remove such noise while synthesizing photo images from hand drawn sketches. Here we implement and compare different filtering methods with GAN. Quantitative and qualitative result shows that proposed Filter-aided GAN generate the photo images which are visually pleasant and closer to ground truth image
Cross domain Image Transformation and Generation by Deep Learning
Compared with single domain learning, cross-domain learning is more challenging due to the large domain variation. In addition, cross-domain image synthesis is more difficult than other cross learning problems, including, for example, correlation analysis, indexing, and retrieval, because it needs to learn complex function which contains image details for photo-realism. This work investigates cross-domain image synthesis in two common and challenging tasks, i.e., image-to-image and non-image-to-image transfer/synthesis.The image-to-image transfer is investigated in Chapter 2, where we develop a method for transformation between face images and sketch images while preserving the identity. Different from existing works that conduct domain transfer in a one-pass manner, we design a recurrent bidirectional transformation network (r-BTN), which allows bidirectional domain transfer in an integrated framework. More importantly, it could perceptually compose partial inputs from two domains to simultaneously synthesize face and sketch images with consistent identity. Most existing works could well synthesize images from patches that cover at least 70% of the original image. The proposed r-BTN could yield appealing results from patches that cover less than 10% because of the recursive estimation of the missing region in an incremental manner. Extensive experiments have been conducted to demonstrate the superior performance of r-BTN as compared to existing solutions.Chapter 3 targets at image transformation/synthesis from non-image sources, i.e., generating talking face based on the audio input. Existing works either do not consider temporal dependency thus yielding abrupt facial/lip movement or are limited to the generation for a specific person thus lacking generalization capacity. A novel conditional recurrent generation network which incorporates image and audio features in the recurrent unit for temporal dependency is proposed such that smooth transition can be achieved for lip and facial movements. To achieve image- and video-realism, we adopt a pair of spatial-temporal discriminators. Accurate lip synchronization is essential to the success of talking face video generation where we construct a lip-reading discriminator to boost the accuracy of lip synchronization. Extensive experiments demonstrate the superiority of our framework over the state-of-the-arts in terms of visual quality, lip sync accuracy, and smooth transition regarding lip and facial movement
- …