150 research outputs found
Text Style Transfer: A Review and Experimental Evaluation
The stylistic properties of text have intrigued computational linguistics
researchers in recent years. Specifically, researchers have investigated the
Text Style Transfer (TST) task, which aims to change the stylistic properties
of the text while retaining its style independent content. Over the last few
years, many novel TST algorithms have been developed, while the industry has
leveraged these algorithms to enable exciting TST applications. The field of
TST research has burgeoned because of this symbiosis. This article aims to
provide a comprehensive review of recent research efforts on text style
transfer. More concretely, we create a taxonomy to organize the TST models and
provide a comprehensive summary of the state of the art. We review the existing
evaluation methodologies for TST tasks and conduct a large-scale
reproducibility study where we experimentally benchmark 19 state-of-the-art TST
algorithms on two publicly available datasets. Finally, we expand on current
trends and provide new perspectives on the new and exciting developments in the
TST field
-Flow: Joint Semantic and Style Editing of Facial Images
The high-quality images yielded by generative adversarial networks (GANs)
have motivated investigations into their application for image editing.
However, GANs are often limited in the control they provide for performing
specific edits. One of the principal challenges is the entangled latent space
of GANs, which is not directly suitable for performing independent and detailed
edits. Recent editing methods allow for either controlled style edits or
controlled semantic edits. In addition, methods that use semantic masks to edit
images have difficulty preserving the identity and are unable to perform
controlled style edits. We propose a method to disentangle a GANs
latent space into semantic and style spaces, enabling controlled semantic and
style edits for face images independently within the same framework. To achieve
this, we design an encoder-decoder based network architecture (-Flow),
which incorporates two proposed inductive biases. We show the suitability of
-Flow quantitatively and qualitatively by performing various semantic and
style edits.Comment: Accepted to BMVC 202
GAN Inversion: A Survey
GAN inversion aims to invert a given image back into the latent space of a
pretrained GAN model, for the image to be faithfully reconstructed from the
inverted code by the generator. As an emerging technique to bridge the real and
fake image domains, GAN inversion plays an essential role in enabling the
pretrained GAN models such as StyleGAN and BigGAN to be used for real image
editing applications. Meanwhile, GAN inversion also provides insights on the
interpretation of GAN's latent space and how the realistic images can be
generated. In this paper, we provide an overview of GAN inversion with a focus
on its recent algorithms and applications. We cover important techniques of GAN
inversion and their applications to image restoration and image manipulation.
We further elaborate on some trends and challenges for future directions
Deep Learning for Text Style Transfer: A Survey
Text style transfer is an important task in natural language generation,
which aims to control certain attributes in the generated text, such as
politeness, emotion, humor, and many others. It has a long history in the field
of natural language processing, and recently has re-gained significant
attention thanks to the promising performance brought by deep neural models. In
this paper, we present a systematic survey of the research on neural text style
transfer, spanning over 100 representative articles since the first neural text
style transfer work in 2017. We discuss the task formulation, existing datasets
and subtasks, evaluation, as well as the rich methodologies in the presence of
parallel and non-parallel data. We also provide discussions on a variety of
important topics regarding the future development of this task. Our curated
paper list is at https://github.com/zhijing-jin/Text_Style_Transfer_SurveyComment: Computational Linguistics Journal 202
LatentSwap3D: Semantic Edits on 3D Image GANs
3D GANs have the ability to generate latent codes for entire 3D volumes
rather than only 2D images. These models offer desirable features like
high-quality geometry and multi-view consistency, but, unlike their 2D
counterparts, complex semantic image editing tasks for 3D GANs have only been
partially explored. To address this problem, we propose LatentSwap3D, a
semantic edit approach based on latent space discovery that can be used with
any off-the-shelf 3D or 2D GAN model and on any dataset. LatentSwap3D relies on
identifying the latent code dimensions corresponding to specific attributes by
feature ranking using a random forest classifier. It then performs the edit by
swapping the selected dimensions of the image being edited with the ones from
an automatically selected reference image. Compared to other latent space
control-based edit methods, which were mainly designed for 2D GANs, our method
on 3D GANs provides remarkably consistent semantic edits in a disentangled
manner and outperforms others both qualitatively and quantitatively. We show
results on seven 3D GANs (pi-GAN, GIRAFFE, StyleSDF, MVCGAN, EG3D, StyleNeRF,
and VolumeGAN) and on five datasets (FFHQ, AFHQ, Cats, MetFaces, and CompCars).Comment: The paper has been accepted by ICCV'23 AI3DC
TediGAN: Text-Guided Diverse Face Image Generation and Manipulation
In this work, we propose TediGAN, a novel framework for multi-modal image
generation and manipulation with textual descriptions. The proposed method
consists of three components: StyleGAN inversion module, visual-linguistic
similarity learning, and instance-level optimization. The inversion module maps
real images to the latent space of a well-trained StyleGAN. The
visual-linguistic similarity learns the text-image matching by mapping the
image and text into a common embedding space. The instance-level optimization
is for identity preservation in manipulation. Our model can produce diverse and
high-quality images with an unprecedented resolution at 1024. Using a control
mechanism based on style-mixing, our TediGAN inherently supports image
synthesis with multi-modal inputs, such as sketches or semantic labels, with or
without instance guidance. To facilitate text-guided multi-modal synthesis, we
propose the Multi-Modal CelebA-HQ, a large-scale dataset consisting of real
face images and corresponding semantic segmentation map, sketch, and textual
descriptions. Extensive experiments on the introduced dataset demonstrate the
superior performance of our proposed method. Code and data are available at
https://github.com/weihaox/TediGAN.Comment: CVPR 2021. Code: https://github.com/weihaox/TediGAN Data:
https://github.com/weihaox/Multi-Modal-CelebA-HQ Video:
https://youtu.be/L8Na2f5viA
BSD-GAN: Branched Generative Adversarial Network for Scale-Disentangled Representation Learning and Image Synthesis
We introduce BSD-GAN, a novel multi-branch and scale-disentangled training
method which enables unconditional Generative Adversarial Networks (GANs) to
learn image representations at multiple scales, benefiting a wide range of
generation and editing tasks. The key feature of BSD-GAN is that it is trained
in multiple branches, progressively covering both the breadth and depth of the
network, as resolutions of the training images increase to reveal finer-scale
features. Specifically, each noise vector, as input to the generator network of
BSD-GAN, is deliberately split into several sub-vectors, each corresponding to,
and is trained to learn, image representations at a particular scale. During
training, we progressively "de-freeze" the sub-vectors, one at a time, as a new
set of higher-resolution images is employed for training and more network
layers are added. A consequence of such an explicit sub-vector designation is
that we can directly manipulate and even combine latent (sub-vector) codes
which model different feature scales.Extensive experiments demonstrate the
effectiveness of our training method in scale-disentangled learning of image
representations and synthesis of novel image contents, without any extra labels
and without compromising quality of the synthesized high-resolution images. We
further demonstrate several image generation and manipulation applications
enabled or improved by BSD-GAN. Source codes are available at
https://github.com/duxingren14/BSD-GAN.Comment: 12 pages, 20 figures, accepted to IEEE Transaction on Image
Processin
3DAvatarGAN: Bridging Domains for Personalized Editable Avatars
Modern 3D-GANs synthesize geometry and texture by training on large-scale
datasets with a consistent structure. Training such models on stylized,
artistic data, with often unknown, highly variable geometry, and camera
information has not yet been shown possible. Can we train a 3D GAN on such
artistic data, while maintaining multi-view consistency and texture quality? To
this end, we propose an adaptation framework, where the source domain is a
pre-trained 3D-GAN, while the target domain is a 2D-GAN trained on artistic
datasets. We then distill the knowledge from a 2D generator to the source 3D
generator. To do that, we first propose an optimization-based method to align
the distributions of camera parameters across domains. Second, we propose
regularizations necessary to learn high-quality texture, while avoiding
degenerate geometric solutions, such as flat shapes. Third, we show a
deformation-based technique for modeling exaggerated geometry of artistic
domains, enabling -- as a byproduct -- personalized geometric editing. Finally,
we propose a novel inversion method for 3D-GANs linking the latent spaces of
the source and the target domains. Our contributions -- for the first time --
allow for the generation, editing, and animation of personalized artistic 3D
avatars on artistic datasets.Comment: Project Page: https://rameenabdal.github.io/3DAvatarGAN
- …