4 research outputs found
Disentangling Content and Style via Unsupervised Geometry Distillation
It is challenging to disentangle an object into two orthogonal spaces of
content and style since each can influence the visual observation differently
and unpredictably. It is rare for one to have access to a large number of data
to help separate the influences. In this paper, we present a novel framework to
learn this disentangled representation in a completely unsupervised manner. We
address this problem in a two-branch Autoencoder framework. For the structural
content branch, we project the latent factor into a soft structured point
tensor and constrain it with losses derived from prior knowledge. This
constraint encourages the branch to distill geometry information. Another
branch learns the complementary style information. The two branches form an
effective framework that can disentangle object's content-style representation
without any human annotation. We evaluate our approach on four image datasets,
on which we demonstrate the superior disentanglement and visual analogy quality
both in synthesized and real-world data. We are able to generate
photo-realistic images with 256*256 resolution that are clearly disentangled in
content and style.Comment: Accepted to ICLR 2019 Worksho
TransGaGa: Geometry-Aware Unsupervised Image-to-Image Translation
Unsupervised image-to-image translation aims at learning a mapping between
two visual domains. However, learning a translation across large geometry
variations always ends up with failure. In this work, we present a novel
disentangle-and-translate framework to tackle the complex objects
image-to-image translation task. Instead of learning the mapping on the image
space directly, we disentangle image space into a Cartesian product of the
appearance and the geometry latent spaces. Specifically, we first introduce a
geometry prior loss and a conditional VAE loss to encourage the network to
learn independent but complementary representations. The translation is then
built on appearance and geometry space separately. Extensive experiments
demonstrate the superior performance of our method to other state-of-the-art
approaches, especially in the challenging near-rigid and non-rigid objects
translation tasks. In addition, by taking different exemplars as the appearance
references, our method also supports multimodal translation. Project page:
https://wywu.github.io/projects/TGaGa/TGaGa.htmlComment: Accepted to CVPR 2019. Project page:
https://wywu.github.io/projects/TGaGa/TGaGa.htm
Neural Multi-scale Image Compression
This study presents a new lossy image compression method that utilizes the
multi-scale features of natural images. Our model consists of two networks:
multi-scale lossy autoencoder and parallel multi-scale lossless coder. The
multi-scale lossy autoencoder extracts the multi-scale image features to
quantized variables and the parallel multi-scale lossless coder enables rapid
and accurate lossless coding of the quantized variables via encoding/decoding
the variables in parallel. Our proposed model achieves comparable performance
to the state-of-the-art model on Kodak and RAISE-1k dataset images, and it
encodes a PNG image of size in 70 ms with a single GPU and a
single CPU process and decodes it into a high-fidelity image in approximately
200 ms.Comment: 15 pages, 15 figure
The Focus-Aspect-Polarity Model for Predicting Subjective Noun Attributes in Images
Subjective visual interpretation is a challenging yet important topic in
computer vision. Many approaches reduce this problem to the prediction of
adjective- or attribute-labels from images. However, most of these do not take
attribute semantics into account, or only process the image in a holistic
manner. Furthermore, there is a lack of relevant datasets with fine-grained
subjective labels. In this paper, we propose the Focus-Aspect-Polarity model to
structure the process of capturing subjectivity in image processing, and
introduce a novel dataset following this way of modeling. We run experiments on
this dataset to compare several deep learning methods and find that
incorporating context information based on tensor multiplication in several
cases outperforms the default way of information fusion (concatenation)