Search CORE

150 research outputs found

Text Style Transfer: A Review and Experimental Evaluation

Author: Aggarwal Charu C.
Hu Zhiqiang
Lee Roy Ka-Wei
Zhang Aston
Publication venue
Publication date: 11/04/2021
Field of study

The stylistic properties of text have intrigued computational linguistics researchers in recent years. Specifically, researchers have investigated the Text Style Transfer (TST) task, which aims to change the stylistic properties of the text while retaining its style independent content. Over the last few years, many novel TST algorithms have been developed, while the industry has leveraged these algorithms to enable exciting TST applications. The field of TST research has burgeoned because of this symbiosis. This article aims to provide a comprehensive review of recent research efforts on text style transfer. More concretely, we create a taxonomy to organize the TST models and provide a comprehensive summary of the state of the art. We review the existing evaluation methodologies for TST tasks and conduct a large-scale reproducibility study where we experimentally benchmark 19 state-of-the-art TST algorithms on two publicly available datasets. Finally, we expand on current trends and provide new perspectives on the new and exciting developments in the TST field

arXiv.org e-Print Archive

$S^2$ -Flow: Joint Semantic and Style Editing of Facial Images

Author: Roth Stefan
Schaub-Meyer Simone
Singh Krishnakant
Publication venue
Publication date: 22/11/2022
Field of study

The high-quality images yielded by generative adversarial networks (GANs) have motivated investigations into their application for image editing. However, GANs are often limited in the control they provide for performing specific edits. One of the principal challenges is the entangled latent space of GANs, which is not directly suitable for performing independent and detailed edits. Recent editing methods allow for either controlled style edits or controlled semantic edits. In addition, methods that use semantic masks to edit images have difficulty preserving the identity and are unable to perform controlled style edits. We propose a method to disentangle a GAN

\text{'}

s latent space into semantic and style spaces, enabling controlled semantic and style edits for face images independently within the same framework. To achieve this, we design an encoder-decoder based network architecture (

S^2

-Flow), which incorporates two proposed inductive biases. We show the suitability of

S^2

-Flow quantitatively and qualitatively by performing various semantic and style edits.Comment: Accepted to BMVC 202

arXiv.org e-Print Archive

GAN Inversion: A Survey

Author: Xia Weihao
Xue Jing-Hao
Yang Ming-Hsuan
Yang Yujiu
Zhang Yulun
Zhou Bolei
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2022
Field of study

GAN inversion aims to invert a given image back into the latent space of a pretrained GAN model, for the image to be faithfully reconstructed from the inverted code by the generator. As an emerging technique to bridge the real and fake image domains, GAN inversion plays an essential role in enabling the pretrained GAN models such as StyleGAN and BigGAN to be used for real image editing applications. Meanwhile, GAN inversion also provides insights on the interpretation of GAN's latent space and how the realistic images can be generated. In this paper, we provide an overview of GAN inversion with a focus on its recent algorithms and applications. We cover important techniques of GAN inversion and their applications to image restoration and image manipulation. We further elaborate on some trends and challenges for future directions

arXiv.org e-Print Archive

UCL Discovery

Deep Learning for Text Style Transfer: A Survey

Author: Hu Zhiting
Jin Di
Jin Zhijing
Mihalcea Rada
Vechtomova Olga
Publication venue
Publication date: 16/12/2021
Field of study

Text style transfer is an important task in natural language generation, which aims to control certain attributes in the generated text, such as politeness, emotion, humor, and many others. It has a long history in the field of natural language processing, and recently has re-gained significant attention thanks to the promising performance brought by deep neural models. In this paper, we present a systematic survey of the research on neural text style transfer, spanning over 100 representative articles since the first neural text style transfer work in 2017. We discuss the task formulation, existing datasets and subtasks, evaluation, as well as the rich methodologies in the presence of parallel and non-parallel data. We also provide discussions on a variety of important topics regarding the future development of this task. Our curated paper list is at https://github.com/zhijing-jin/Text_Style_Transfer_SurveyComment: Computational Linguistics Journal 202

arXiv.org e-Print Archive

Repository for Publications and Research Data

LatentSwap3D: Semantic Edits on 3D Image GANs

Author: Simsar Enis
Tombari Federico
Tonioni Alessio
Örnek Evin Pınar
Publication venue
Publication date: 04/09/2023
Field of study

3D GANs have the ability to generate latent codes for entire 3D volumes rather than only 2D images. These models offer desirable features like high-quality geometry and multi-view consistency, but, unlike their 2D counterparts, complex semantic image editing tasks for 3D GANs have only been partially explored. To address this problem, we propose LatentSwap3D, a semantic edit approach based on latent space discovery that can be used with any off-the-shelf 3D or 2D GAN model and on any dataset. LatentSwap3D relies on identifying the latent code dimensions corresponding to specific attributes by feature ranking using a random forest classifier. It then performs the edit by swapping the selected dimensions of the image being edited with the ones from an automatically selected reference image. Compared to other latent space control-based edit methods, which were mainly designed for 2D GANs, our method on 3D GANs provides remarkably consistent semantic edits in a disentangled manner and outperforms others both qualitatively and quantitatively. We show results on seven 3D GANs (pi-GAN, GIRAFFE, StyleSDF, MVCGAN, EG3D, StyleNeRF, and VolumeGAN) and on five datasets (FFHQ, AFHQ, Cats, MetFaces, and CompCars).Comment: The paper has been accepted by ICCV'23 AI3DC

arXiv.org e-Print Archive

TediGAN: Text-Guided Diverse Face Image Generation and Manipulation

Author: Wu Baoyuan
Xia Weihao
Xue Jing-Hao
Yang Yujiu
Publication venue
Publication date: 29/03/2021
Field of study

In this work, we propose TediGAN, a novel framework for multi-modal image generation and manipulation with textual descriptions. The proposed method consists of three components: StyleGAN inversion module, visual-linguistic similarity learning, and instance-level optimization. The inversion module maps real images to the latent space of a well-trained StyleGAN. The visual-linguistic similarity learns the text-image matching by mapping the image and text into a common embedding space. The instance-level optimization is for identity preservation in manipulation. Our model can produce diverse and high-quality images with an unprecedented resolution at 1024. Using a control mechanism based on style-mixing, our TediGAN inherently supports image synthesis with multi-modal inputs, such as sketches or semantic labels, with or without instance guidance. To facilitate text-guided multi-modal synthesis, we propose the Multi-Modal CelebA-HQ, a large-scale dataset consisting of real face images and corresponding semantic segmentation map, sketch, and textual descriptions. Extensive experiments on the introduced dataset demonstrate the superior performance of our proposed method. Code and data are available at https://github.com/weihaox/TediGAN.Comment: CVPR 2021. Code: https://github.com/weihaox/TediGAN Data: https://github.com/weihaox/Multi-Modal-CelebA-HQ Video: https://youtu.be/L8Na2f5viA

arXiv.org e-Print Archive

UCL Discovery

BSD-GAN: Branched Generative Adversarial Network for Scale-Disentangled Representation Learning and Image Synthesis

Author: Cai Hao
Chen Zhiqin
Gong Minglun
Mao Wendong
Yi Zili
Zhang Hao
Publication venue
Publication date: 01/01/2020
Field of study

We introduce BSD-GAN, a novel multi-branch and scale-disentangled training method which enables unconditional Generative Adversarial Networks (GANs) to learn image representations at multiple scales, benefiting a wide range of generation and editing tasks. The key feature of BSD-GAN is that it is trained in multiple branches, progressively covering both the breadth and depth of the network, as resolutions of the training images increase to reveal finer-scale features. Specifically, each noise vector, as input to the generator network of BSD-GAN, is deliberately split into several sub-vectors, each corresponding to, and is trained to learn, image representations at a particular scale. During training, we progressively "de-freeze" the sub-vectors, one at a time, as a new set of higher-resolution images is employed for training and more network layers are added. A consequence of such an explicit sub-vector designation is that we can directly manipulate and even combine latent (sub-vector) codes which model different feature scales.Extensive experiments demonstrate the effectiveness of our training method in scale-disentangled learning of image representations and synthesis of novel image contents, without any extra labels and without compromising quality of the synthesized high-resolution images. We further demonstrate several image generation and manipulation applications enabled or improved by BSD-GAN. Source codes are available at https://github.com/duxingren14/BSD-GAN.Comment: 12 pages, 20 figures, accepted to IEEE Transaction on Image Processin

arXiv.org e-Print Archive

Crossref

3DAvatarGAN: Bridging Domains for Personalized Editable Avatars

Author: Abdal Rameen
Chai Menglei
Lee Hsin-Ying
Siarohin Aliaksandr
Tulyakov Sergey
Wonka Peter
Zhu Peihao
Publication venue
Publication date: 26/03/2023
Field of study

Modern 3D-GANs synthesize geometry and texture by training on large-scale datasets with a consistent structure. Training such models on stylized, artistic data, with often unknown, highly variable geometry, and camera information has not yet been shown possible. Can we train a 3D GAN on such artistic data, while maintaining multi-view consistency and texture quality? To this end, we propose an adaptation framework, where the source domain is a pre-trained 3D-GAN, while the target domain is a 2D-GAN trained on artistic datasets. We then distill the knowledge from a 2D generator to the source 3D generator. To do that, we first propose an optimization-based method to align the distributions of camera parameters across domains. Second, we propose regularizations necessary to learn high-quality texture, while avoiding degenerate geometric solutions, such as flat shapes. Third, we show a deformation-based technique for modeling exaggerated geometry of artistic domains, enabling -- as a byproduct -- personalized geometric editing. Finally, we propose a novel inversion method for 3D-GANs linking the latent spaces of the source and the target domains. Our contributions -- for the first time -- allow for the generation, editing, and animation of personalized artistic 3D avatars on artistic datasets.Comment: Project Page: https://rameenabdal.github.io/3DAvatarGAN

arXiv.org e-Print Archive