3 research outputs found
Unpaired High-Resolution and Scalable Style Transfer Using Generative Adversarial Networks
Neural networks have proven their capabilities by outperforming many other
approaches on regression or classification tasks on various kinds of data.
Other astonishing results have been achieved using neural nets as data
generators, especially in settings of generative adversarial networks (GANs).
One special application is the field of image domain translations. Here, the
goal is to take an image with a certain style (e.g. a photography) and
transform it into another one (e.g. a painting). If such a task is performed
for unpaired training examples, the corresponding GAN setting is complex, the
neural networks are large, and this leads to a high peak memory consumption
during, both, training and evaluation phase. This sets a limit to the highest
processable image size. We address this issue by the idea of not processing the
whole image at once, but to train and evaluate the domain translation on the
level of overlapping image subsamples. This new approach not only enables us to
translate high-resolution images that otherwise cannot be processed by the
neural network at once, but also allows us to work with comparably small neural
networks and with limited hardware resources. Additionally, the number of
images required for the training process is significantly reduced. We present
high-quality results on images with a total resolution of up to over 50
megapixels and emonstrate that our method helps to preserve local image details
while it also keeps global consistency.Comment: 10 pages, 8 figure
Collaborative Distillation for Ultra-Resolution Universal Style Transfer
Universal style transfer methods typically leverage rich representations from
deep Convolutional Neural Network (CNN) models (e.g., VGG-19) pre-trained on
large collections of images. Despite the effectiveness, its application is
heavily constrained by the large model size to handle ultra-resolution images
given limited memory. In this work, we present a new knowledge distillation
method (named Collaborative Distillation) for encoder-decoder based neural
style transfer to reduce the convolutional filters. The main idea is
underpinned by a finding that the encoder-decoder pairs construct an exclusive
collaborative relationship, which is regarded as a new kind of knowledge for
style transfer models. Moreover, to overcome the feature size mismatch when
applying collaborative distillation, a linear embedding loss is introduced to
drive the student network to learn a linear embedding of the teacher's
features. Extensive experiments show the effectiveness of our method when
applied to different universal style transfer approaches (WCT and AdaIN), even
if the model size is reduced by 15.5 times. Especially, on WCT with the
compressed models, we achieve ultra-resolution (over 40 megapixels) universal
style transfer on a 12GB GPU for the first time. Further experiments on
optimization-based stylization scheme show the generality of our algorithm on
different stylization paradigms. Our code and trained models are available at
https://github.com/mingsun-tse/collaborative-distillation.Comment: Accepted by CVPR 2020, higher-resolution images than the camera-ready
versio
Rethinking conditional GAN training: An approach using geometrically structured latent manifolds
Conditional GANs (cGAN), in their rudimentary form, suffer from critical
drawbacks such as the lack of diversity in generated outputs and distortion
between the latent and output manifolds. Although efforts have been made to
improve results, they can suffer from unpleasant side-effects such as the
topology mismatch between latent and output spaces. In contrast, we tackle this
problem from a geometrical perspective and propose a novel training mechanism
that increases both the diversity and the visual quality of a vanilla cGAN, by
systematically encouraging a bi-lipschitz mapping between the latent and the
output manifolds. We validate the efficacy of our solution on a baseline cGAN
(i.e., Pix2Pix) which lacks diversity, and show that by only modifying its
training mechanism (i.e., with our proposed Pix2Pix-Geo), one can achieve more
diverse and realistic outputs on a broad set of image-to-image translation
tasks. Codes are available at https://github.com/samgregoost/Rethinking-CGANs