Search CORE

4 research outputs found

Disentangling Content and Style via Unsupervised Geometry Distillation

Author: Cao Kaidi
Li Cheng
Loy Chen Change
Qian Chen
Wu Wayne
Publication venue
Publication date: 11/05/2019
Field of study

It is challenging to disentangle an object into two orthogonal spaces of content and style since each can influence the visual observation differently and unpredictably. It is rare for one to have access to a large number of data to help separate the influences. In this paper, we present a novel framework to learn this disentangled representation in a completely unsupervised manner. We address this problem in a two-branch Autoencoder framework. For the structural content branch, we project the latent factor into a soft structured point tensor and constrain it with losses derived from prior knowledge. This constraint encourages the branch to distill geometry information. Another branch learns the complementary style information. The two branches form an effective framework that can disentangle object's content-style representation without any human annotation. We evaluate our approach on four image datasets, on which we demonstrate the superior disentanglement and visual analogy quality both in synthesized and real-world data. We are able to generate photo-realistic images with 256*256 resolution that are clearly disentangled in content and style.Comment: Accepted to ICLR 2019 Worksho

arXiv.org e-Print Archive

TransGaGa: Geometry-Aware Unsupervised Image-to-Image Translation

Author: Cao Kaidi
Li Cheng
Loy Chen Change
Qian Chen
Wu Wayne
Publication venue
Publication date: 21/04/2019
Field of study

Unsupervised image-to-image translation aims at learning a mapping between two visual domains. However, learning a translation across large geometry variations always ends up with failure. In this work, we present a novel disentangle-and-translate framework to tackle the complex objects image-to-image translation task. Instead of learning the mapping on the image space directly, we disentangle image space into a Cartesian product of the appearance and the geometry latent spaces. Specifically, we first introduce a geometry prior loss and a conditional VAE loss to encourage the network to learn independent but complementary representations. The translation is then built on appearance and geometry space separately. Extensive experiments demonstrate the superior performance of our method to other state-of-the-art approaches, especially in the challenging near-rigid and non-rigid objects translation tasks. In addition, by taking different exemplars as the appearance references, our method also supports multimodal translation. Project page: https://wywu.github.io/projects/TGaGa/TGaGa.htmlComment: Accepted to CVPR 2019. Project page: https://wywu.github.io/projects/TGaGa/TGaGa.htm

arXiv.org e-Print Archive

Neural Multi-scale Image Compression

Author: Maeda Shin-ichi
Miyato Takeru
Nakanishi Ken
Okanohara Daisuke
Publication venue
Publication date: 16/05/2018
Field of study

This study presents a new lossy image compression method that utilizes the multi-scale features of natural images. Our model consists of two networks: multi-scale lossy autoencoder and parallel multi-scale lossless coder. The multi-scale lossy autoencoder extracts the multi-scale image features to quantized variables and the parallel multi-scale lossless coder enables rapid and accurate lossless coding of the quantized variables via encoding/decoding the variables in parallel. Our proposed model achieves comparable performance to the state-of-the-art model on Kodak and RAISE-1k dataset images, and it encodes a PNG image of size

768 \times 512

in 70 ms with a single GPU and a single CPU process and decodes it into a high-fidelity image in approximately 200 ms.Comment: 15 pages, 15 figure

arXiv.org e-Print Archive

The Focus-Aspect-Polarity Model for Predicting Subjective Noun Attributes in Images

Author: Blandfort Philipp
Dengel Andreas
Hees Jörn
Karayil Tushar
Publication venue
Publication date: 15/10/2018
Field of study

Subjective visual interpretation is a challenging yet important topic in computer vision. Many approaches reduce this problem to the prediction of adjective- or attribute-labels from images. However, most of these do not take attribute semantics into account, or only process the image in a holistic manner. Furthermore, there is a lack of relevant datasets with fine-grained subjective labels. In this paper, we propose the Focus-Aspect-Polarity model to structure the process of capturing subjectivity in image processing, and introduce a novel dataset following this way of modeling. We run experiments on this dataset to compare several deep learning methods and find that incorporating context information based on tensor multiplication in several cases outperforms the default way of information fusion (concatenation)

arXiv.org e-Print Archive