10 research outputs found
Fine-Grained Expression Manipulation via Structured Latent Space
Fine-grained facial expression manipulation is a challenging problem, as
fine-grained expression details are difficult to be captured. Most existing
expression manipulation methods resort to discrete expression labels, which
mainly edit global expressions and ignore the manipulation of fine details. To
tackle this limitation, we propose an end-to-end expression-guided generative
adversarial network (EGGAN), which utilizes structured latent codes and
continuous expression labels as input to generate images with expected
expressions. Specifically, we adopt an adversarial autoencoder to map a source
image into a structured latent space. Then, given the source latent code and
the target expression label, we employ a conditional GAN to generate a new
image with the target expression. Moreover, we introduce a perceptual loss and
a multi-scale structural similarity loss to preserve identity and global shape
during generation. Extensive experiments show that our method can manipulate
fine-grained expressions, and generate continuous intermediate expressions
between source and target expressions
LEED: Label-Free Expression Editing via Disentanglement
Recent studies on facial expression editing have obtained very promising
progress. On the other hand, existing methods face the constraint of requiring
a large amount of expression labels which are often expensive and
time-consuming to collect. This paper presents an innovative label-free
expression editing via disentanglement (LEED) framework that is capable of
editing the expression of both frontal and profile facial images without
requiring any expression label. The idea is to disentangle the identity and
expression of a facial image in the expression manifold, where the neutral face
captures the identity attribute and the displacement between the neutral image
and the expressive image captures the expression attribute. Two novel losses
are designed for optimal expression disentanglement and consistent synthesis,
including a mutual expression information loss that aims to extract pure
expression-related features and a siamese loss that aims to enhance the
expression similarity between the synthesized image and the reference image.
Extensive experiments over two public facial expression datasets show that LEED
achieves superior facial expression editing qualitatively and quantitatively.Comment: Accepted to ECCV 202
BioGAN: An unpaired GAN-based image to image translation model for microbiological images
Background and objective: A diversified dataset is crucial for training a well-generalized supervised computer vision algorithm. However, in the field of microbiology, generation and annotation of a diverse dataset including field-taken images are time-consuming, costly, and in some cases impossible. Image to image translation frameworks allow us to diversify the dataset by transferring images from one domain to another. However, most existing image translation techniques require a paired dataset (original image and its corresponding image in the target domain), which poses a significant challenge in collecting such datasets. In addition, the application of these image translation frameworks in microbiology] is rarely discussed . In this study, we aim to develop an unpaired GAN-based (Generative Adversarial Network) image to image translation model for microbiological images, and study how it can improve generalization ability of object detection models. Methods: In this paper, we present an unpaired and unsupervised image translation model to translate laboratory-taken microbiological images to field images, building upon the recent advances in GAN networks and Perceptual loss function. We propose a novel design for a GAN model, BioGAN, by utilizing Adversarial and Perceptual loss in order to transform high level features of laboratory-taken images of Prototheca bovis into field images, while keeping their spatial features. Results: We studied the contribution of Adversarial and Perceptual loss in the generation of realistic field images. We used the synthetic field images, generated by BioGAN, to train an object-detection framework, and compared the results with those of an object-detection framework trained with laboratory images; this resulted in up to 68.1% and 75.3% improvement on F1score and mAP, respectively. We also present the results of a qualitative evaluation test, performed by experts, of the similarity of BioGAN synthetic images with field images