3,246 research outputs found
Towards Realistic Face Photo-Sketch Synthesis via Composition-Aided GANs
Face photo-sketch synthesis aims at generating a facial sketch/photo
conditioned on a given photo/sketch. It is of wide applications including
digital entertainment and law enforcement. Precisely depicting face
photos/sketches remains challenging due to the restrictions on structural
realism and textural consistency. While existing methods achieve compelling
results, they mostly yield blurred effects and great deformation over various
facial components, leading to the unrealistic feeling of synthesized images. To
tackle this challenge, in this work, we propose to use the facial composition
information to help the synthesis of face sketch/photo. Specially, we propose a
novel composition-aided generative adversarial network (CA-GAN) for face
photo-sketch synthesis. In CA-GAN, we utilize paired inputs including a face
photo/sketch and the corresponding pixel-wise face labels for generating a
sketch/photo. In addition, to focus training on hard-generated components and
delicate facial structures, we propose a compositional reconstruction loss.
Finally, we use stacked CA-GANs (SCA-GAN) to further rectify defects and add
compelling details. Experimental results show that our method is capable of
generating both visually comfortable and identity-preserving face
sketches/photos over a wide range of challenging data. Our method achieves the
state-of-the-art quality, reducing best previous Frechet Inception distance
(FID) by a large margin. Besides, we demonstrate that the proposed method is of
considerable generalization ability. We have made our code and results publicly
available: https://fei-hdu.github.io/ca-gan/.Comment: 10 pages, 8 figures, journa
r-BTN: Cross-domain Face Composite and Synthesis from Limited Facial Patches
We start by asking an interesting yet challenging question, "If an eyewitness
can only recall the eye features of the suspect, such that the forensic artist
can only produce a sketch of the eyes (e.g., the top-left sketch shown in Fig.
1), can advanced computer vision techniques help generate the whole face
image?" A more generalized question is that if a large proportion (e.g., more
than 50%) of the face/sketch is missing, can a realistic whole face
sketch/image still be estimated. Existing face completion and generation
methods either do not conduct domain transfer learning or can not handle large
missing area. For example, the inpainting approach tends to blur the generated
region when the missing area is large (i.e., more than 50%). In this paper, we
exploit the potential of deep learning networks in filling large missing region
(e.g., as high as 95% missing) and generating realistic faces with
high-fidelity in cross domains. We propose the recursive generation by
bidirectional transformation networks (r-BTN) that recursively generates a
whole face/sketch from a small sketch/face patch. The large missing area and
the cross domain challenge make it difficult to generate satisfactory results
using a unidirectional cross-domain learning structure. On the other hand, a
forward and backward bidirectional learning between the face and sketch domains
would enable recursive estimation of the missing region in an incremental
manner (Fig. 1) and yield appealing results. r-BTN also adopts an adversarial
constraint to encourage the generation of realistic faces/sketches. Extensive
experiments have been conducted to demonstrate the superior performance from
r-BTN as compared to existing potential solutions.Comment: Accepted by AAAI 201
Semi-supervised FusedGAN for Conditional Image Generation
We present FusedGAN, a deep network for conditional image synthesis with
controllable sampling of diverse images. Fidelity, diversity and controllable
sampling are the main quality measures of a good image generation model. Most
existing models are insufficient in all three aspects. The FusedGAN can perform
controllable sampling of diverse images with very high fidelity. We argue that
controllability can be achieved by disentangling the generation process into
various stages. In contrast to stacked GANs, where multiple stages of GANs are
trained separately with full supervision of labeled intermediate images, the
FusedGAN has a single stage pipeline with a built-in stacking of GANs. Unlike
existing methods, which requires full supervision with paired conditions and
images, the FusedGAN can effectively leverage more abundant images without
corresponding conditions in training, to produce more diverse samples with high
fidelity. We achieve this by fusing two generators: one for unconditional image
generation, and the other for conditional image generation, where the two
partly share a common latent space thereby disentangling the generation. We
demonstrate the efficacy of the FusedGAN in fine grained image generation tasks
such as text-to-image, and attribute-to-face generation
Network-to-Network Translation with Conditional Invertible Neural Networks
Given the ever-increasing computational costs of modern machine learning
models, we need to find new ways to reuse such expert models and thus tap into
the resources that have been invested in their creation. Recent work suggests
that the power of these massive models is captured by the representations they
learn. Therefore, we seek a model that can relate between different existing
representations and propose to solve this task with a conditionally invertible
network. This network demonstrates its capability by (i) providing generic
transfer between diverse domains, (ii) enabling controlled content synthesis by
allowing modification in other domains, and (iii) facilitating diagnosis of
existing representations by translating them into interpretable domains such as
images. Our domain transfer network can translate between fixed representations
without having to learn or finetune them. This allows users to utilize various
existing domain-specific expert models from the literature that had been
trained with extensive computational resources. Experiments on diverse
conditional image synthesis tasks, competitive image modification results and
experiments on image-to-image and text-to-image generation demonstrate the
generic applicability of our approach. For example, we translate between BERT
and BigGAN, state-of-the-art text and image models to provide text-to-image
generation, which neither of both experts can perform on their own.Comment: NeurIPS 2020 (oral). Code at https://github.com/CompVis/net2ne
Attention-Guided Generative Adversarial Networks for Unsupervised Image-to-Image Translation
The state-of-the-art approaches in Generative Adversarial Networks (GANs) are
able to learn a mapping function from one image domain to another with unpaired
image data. However, these methods often produce artifacts and can only be able
to convert low-level information, but fail to transfer high-level semantic part
of images. The reason is mainly that generators do not have the ability to
detect the most discriminative semantic part of images, which thus makes the
generated images with low-quality. To handle the limitation, in this paper we
propose a novel Attention-Guided Generative Adversarial Network (AGGAN), which
can detect the most discriminative semantic object and minimize changes of
unwanted part for semantic manipulation problems without using extra data and
models. The attention-guided generators in AGGAN are able to produce attention
masks via a built-in attention mechanism, and then fuse the input image with
the attention mask to obtain a target image with high-quality. Moreover, we
propose a novel attention-guided discriminator which only considers attended
regions. The proposed AGGAN is trained by an end-to-end fashion with an
adversarial loss, cycle-consistency loss, pixel loss and attention loss. Both
qualitative and quantitative results demonstrate that our approach is effective
to generate sharper and more accurate images than existing models. The code is
available at https://github.com/Ha0Tang/AttentionGAN.Comment: 8 pages, 7 figures, Accepted to IJCNN 201
Face Sketch to Image Generation using Generative Adversarial Network
Numerous studies have been conducted in the area of sketch to picture conversion and they got the good outcomes, but sometimes it is not accurate that they observed the blurry boundaries, the mixing of two colors that is the color of hair and face or mixing of both. These results are of the convolution neural networks that are basic of GAN. So to overcome their drawbacks we proposed a novel generative adversarial network using conditional GAN. For that we converted the original image in sketch and both the sketch and original image as reference is applied as input. We got more realistic and sharp colored images as compared to other. We focused on the feature detection, and the results are good. For the experimentation we used the STL-10 dataset. We overcome the problem of mixing of colors and got the different colors for hair, lips, and skin using conditional GAN as compared to CNN modern with increased performance and precision
TediGAN: Text-Guided Diverse Face Image Generation and Manipulation
In this work, we propose TediGAN, a novel framework for multi-modal image
generation and manipulation with textual descriptions. The proposed method
consists of three components: StyleGAN inversion module, visual-linguistic
similarity learning, and instance-level optimization. The inversion module maps
real images to the latent space of a well-trained StyleGAN. The
visual-linguistic similarity learns the text-image matching by mapping the
image and text into a common embedding space. The instance-level optimization
is for identity preservation in manipulation. Our model can produce diverse and
high-quality images with an unprecedented resolution at 1024. Using a control
mechanism based on style-mixing, our TediGAN inherently supports image
synthesis with multi-modal inputs, such as sketches or semantic labels, with or
without instance guidance. To facilitate text-guided multi-modal synthesis, we
propose the Multi-Modal CelebA-HQ, a large-scale dataset consisting of real
face images and corresponding semantic segmentation map, sketch, and textual
descriptions. Extensive experiments on the introduced dataset demonstrate the
superior performance of our proposed method. Code and data are available at
https://github.com/weihaox/TediGAN.Comment: CVPR 2021. Code: https://github.com/weihaox/TediGAN Data:
https://github.com/weihaox/Multi-Modal-CelebA-HQ Video:
https://youtu.be/L8Na2f5viA
Synthesizing Programs for Images using Reinforced Adversarial Learning
Advances in deep generative networks have led to impressive results in recent
years. Nevertheless, such models can often waste their capacity on the minutiae
of datasets, presumably due to weak inductive biases in their decoders. This is
where graphics engines may come in handy since they abstract away low-level
details and represent images as high-level programs. Current methods that
combine deep learning and renderers are limited by hand-crafted likelihood or
distance functions, a need for large amounts of supervision, or difficulties in
scaling their inference algorithms to richer datasets. To mitigate these
issues, we present SPIRAL, an adversarially trained agent that generates a
program which is executed by a graphics engine to interpret and sample images.
The goal of this agent is to fool a discriminator network that distinguishes
between real and rendered data, trained with a distributed reinforcement
learning setup without any supervision. A surprising finding is that using the
discriminator's output as a reward signal is the key to allow the agent to make
meaningful progress at matching the desired output rendering. To the best of
our knowledge, this is the first demonstration of an end-to-end, unsupervised
and adversarial inverse graphics agent on challenging real world (MNIST,
Omniglot, CelebA) and synthetic 3D datasets.Comment: 12 pages, 13 figure
Beyond Face Rotation: Global and Local Perception GAN for Photorealistic and Identity Preserving Frontal View Synthesis
Photorealistic frontal view synthesis from a single face image has a wide
range of applications in the field of face recognition. Although data-driven
deep learning methods have been proposed to address this problem by seeking
solutions from ample face data, this problem is still challenging because it is
intrinsically ill-posed. This paper proposes a Two-Pathway Generative
Adversarial Network (TP-GAN) for photorealistic frontal view synthesis by
simultaneously perceiving global structures and local details. Four landmark
located patch networks are proposed to attend to local textures in addition to
the commonly used global encoder-decoder network. Except for the novel
architecture, we make this ill-posed problem well constrained by introducing a
combination of adversarial loss, symmetry loss and identity preserving loss.
The combined loss function leverages both frontal face distribution and
pre-trained discriminative deep face models to guide an identity preserving
inference of frontal views from profiles. Different from previous deep learning
methods that mainly rely on intermediate features for recognition, our method
directly leverages the synthesized identity preserving image for downstream
tasks like face recognition and attribution estimation. Experimental results
demonstrate that our method not only presents compelling perceptual results but
also outperforms state-of-the-art results on large pose face recognition.Comment: accepted at ICCV 2017, main paper & supplementary material, 11 page
Towards Fine-grained Human Pose Transfer with Detail Replenishing Network
Human pose transfer (HPT) is an emerging research topic with huge potential
in fashion design, media production, online advertising and virtual reality.
For these applications, the visual realism of fine-grained appearance details
is crucial for production quality and user engagement. However, existing HPT
methods often suffer from three fundamental issues: detail deficiency, content
ambiguity and style inconsistency, which severely degrade the visual quality
and realism of generated images. Aiming towards real-world applications, we
develop a more challenging yet practical HPT setting, termed as Fine-grained
Human Pose Transfer (FHPT), with a higher focus on semantic fidelity and detail
replenishment. Concretely, we analyze the potential design flaws of existing
methods via an illustrative example, and establish the core FHPT methodology by
combing the idea of content synthesis and feature transfer together in a
mutually-guided fashion. Thereafter, we substantiate the proposed methodology
with a Detail Replenishing Network (DRN) and a corresponding coarse-to-fine
model training scheme. Moreover, we build up a complete suite of fine-grained
evaluation protocols to address the challenges of FHPT in a comprehensive
manner, including semantic analysis, structural detection and perceptual
quality assessment. Extensive experiments on the DeepFashion benchmark dataset
have verified the power of proposed benchmark against start-of-the-art works,
with 12\%-14\% gain on top-10 retrieval recall, 5\% higher joint localization
accuracy, and near 40\% gain on face identity preservation. Moreover, the
evaluation results offer further insights to the subject matter, which could
inspire many promising future works along this direction.Comment: IEEE TIP submissio
- …