187 research outputs found
Human-Inspired Facial Sketch Synthesis with Dynamic Adaptation
Facial sketch synthesis (FSS) aims to generate a vivid sketch portrait from a
given facial photo. Existing FSS methods merely rely on 2D representations of
facial semantic or appearance. However, professional human artists usually use
outlines or shadings to covey 3D geometry. Thus facial 3D geometry (e.g. depth
map) is extremely important for FSS. Besides, different artists may use diverse
drawing techniques and create multiple styles of sketches; but the style is
globally consistent in a sketch. Inspired by such observations, in this paper,
we propose a novel Human-Inspired Dynamic Adaptation (HIDA) method. Specially,
we propose to dynamically modulate neuron activations based on a joint
consideration of both facial 3D geometry and 2D appearance, as well as globally
consistent style control. Besides, we use deformable convolutions at
coarse-scales to align deep features, for generating abstract and distinct
outlines. Experiments show that HIDA can generate high-quality sketches in
multiple styles, and significantly outperforms previous methods, over a large
range of challenging faces. Besides, HIDA allows precise style control of the
synthesized sketch, and generalizes well to natural scenes and other artistic
styles. Our code and results have been released online at:
https://github.com/AiArt-HDU/HIDA.Comment: To appear on ICCV'2
Recommended from our members
Controllable Neural Synthesis for Natural Images and Vector Art
Neural image synthesis approaches have become increasingly popular over the last years due to their ability to generate photorealistic images useful for several applications, such as digital entertainment, mixed reality, synthetic dataset creation, computer art, to name a few. Despite the progress over the last years, current approaches lack two important aspects: (a) they often fail to capture long-range interactions in the image, and as a result, they fail to generate scenes with complex dependencies between their different objects or parts. (b) they often ignore the underlying 3D geometry of the shape/scene in the image, and as a result, they frequently lose coherency and details.My thesis proposes novel solutions to the above problems. First, I propose a neural transformer architecture that captures long-range interactions and context for image synthesis at high resolutions, leading to synthesizing interesting phenomena in scenes, such as reflections of landscapes onto water or flora consistent with the rest of the landscape, that was not possible to generate reliably with previous ConvNet- and other transformer-based approaches. The key idea of the architecture is to sparsify the transformer\u27s attention matrix at high resolutions, guided by dense attention extracted at lower image resolution. I present qualitative and quantitative results, along with user studies, demonstrating the effectiveness of the method, and its superiority compared to the state-of-the-art. Second, I propose a method that generates artistic images with the guidance of input 3D shapes. In contrast to previous methods, the use of a geometric representation of 3D shape enables the synthesis of more precise stylized drawings with fewer artifacts. My method outputs the synthesized images in a vector representation, enabling richer downstream analysis or editing in interactive applications. I also show that the method produces substantially better results than existing image-based methods, in terms of predicting artistsā drawings and in user evaluation of results
Line drawings for face portraits from photos using global and local structure based GANs
Despite signiļ¬cant effort and notable success of neural style transfer, it remains challenging for highly abstract styles, in particular line drawings. In this paper, we propose APDrawingGAN++, a generative adversarial network (GAN) for transforming face photos to artistic portrait drawings (APDrawings), which addresses substantial challenges including highly abstract style, different drawing techniques for different facial features, and high perceptual sensitivity to artifacts. To address these, we propose a composite GAN architecture that consists of local networks (to learn effective representations for speciļ¬c facial features) and a global network (to capture the overall content). We provide a theoretical explanation for the necessity of this composite GAN structure by proving that any GAN with a single generator cannot generate artistic styles like APDrawings. We further introduce a classiļ¬cation-and-synthesis approach for lips and hair where different drawing styles are used by artists, which applies suitable styles for a given input. To capture the highly abstract art form inherent in APDrawings, we address two challenging operations ā (1) coping with lines with small misalignments while penalizing large discrepancy and (2) generating more continuous lines ā by introducing two novel loss terms: one is a novel distance transform loss with nonlinear mapping and the other is a novel line continuity loss, both of which improve the line quality. We also develop dedicated data augmentation and pre-training to further improve results. Extensive experiments, including a user study, show that our method outperforms state-of-the-art methods, both qualitatively and quantitatively
From rule-based to learning-based image-conditional image generation
Visual contents, such as movies, animations, computer games, videos and photos, are
massively produced and consumed nowadays. Most of these contents are the combination
of materials captured from real-world and contents synthesized by computers. Particularly,
computer-generated visual contents are increasingly indispensable in modern entertainment
and production. The generation of visual contents by computers is typically conditioned on
real-world materials, driven by the imagination of designers and artists, or a combination
of both. However, creating visual contents manually are both challenging and labor intensive.
Therefore, enabling computers to automatically or semi-automatically synthesize
needed visual contents becomes essential. Among all these efforts, a stream of research
is to generate novel images based on given image priors, e.g., photos and sketches. This
research direction is known as image-conditional image generation, which covers a wide
range of topics such as image stylization, image completion, image fusion, sketch-to-image
generation, and extracting image label maps. In this thesis, a set of novel approaches for
image-conditional image generation are presented.
The thesis starts with an exemplar-based method for facial image stylization in Chapter
2. This method involves a unified framework for facial image stylization based on a single
style exemplar. A two-phase procedure is employed, where the first phase searches a dense
and semantic-aware correspondence between the input and the exemplar images, and the
second phase conducts edge-preserving texture transfer. While this algorithm has the merit
of requiring only a single exemplar, it is constrained to face photos. To perform generalized
image-to-image translation, Chapter 3 presents a data-driven and learning-based method. Inspired by the dual learning paradigm designed for natural language translation [115], a
novel dual Generative Adversarial Network (DualGAN) mechanism is developed, which
enables image translators to be trained from two sets of unlabeled images from two domains.
This is followed by another data-driven method in Chapter 4, which learns multiscale
manifolds from a set of images and then enables synthesizing novel images that mimic
the appearance of the target image dataset. The method is named as Branched Generative
Adversarial Network (BranchGAN) and employs a novel training method that enables unconditioned
generative adversarial networks (GANs) to learn image manifolds at multiple
scales. As a result, we can directly manipulate and even combine latent manifold codes
that are associated with specific feature scales. Finally, to provide users more control over
image generation results, Chapter 5 discusses an upgraded version of iGAN [126] (iGANHD)
that significantly improves the art of manipulating high-resolution images through
utilizing the multi-scale manifold learned with BranchGAN
- ā¦