We present a novel approach for synthesizing photo-realistic images of people
in arbitrary poses using generative adversarial learning. Given an input image
of a person and a desired pose represented by a 2D skeleton, our model renders
the image of the same person under the new pose, synthesizing novel views of
the parts visible in the input image and hallucinating those that are not seen.
This problem has recently been addressed in a supervised manner, i.e., during
training the ground truth images under the new poses are given to the network.
We go beyond these approaches by proposing a fully unsupervised strategy. We
tackle this challenging scenario by splitting the problem into two principal
subtasks. First, we consider a pose conditioned bidirectional generator that
maps back the initially rendered image to the original pose, hence being
directly comparable to the input image without the need to resort to any
training image. Second, we devise a novel loss function that incorporates
content and style terms, and aims at producing images of high perceptual
quality. Extensive experiments conducted on the DeepFashion dataset demonstrate
that the images rendered by our model are very close in appearance to those
obtained by fully supervised approaches.Comment: Accepted as Spotlight at CVPR 201

Agudo, Antonio

Moreno-Noguer, Francesc

Pumarola, Albert

Sanfeliu, Alberto

English

arXiv

Trabajo presentado en la IEEE/CVF Conference on Computer Vision and Pattern Recognition, celebrada en Salt Lake City (UT, USA), del 18 al 23 de junio de 2018We present a novel approach for synthesizing photorealistic images of people in arbitrary poses using generative adversarial learning. Given an input image of a person and a desired pose represented by a 2D skeleton, our model renders the image of the same person under the new pose, synthesizing novel views of the parts visible in the input image and hallucinating those that are not seen. This problem has recently been addressed in a supervised manner [16, 35], i.e., during training the ground truth images under the new poses are given to the network. We go beyond these approaches by proposing a fully unsupervised strategy. We tackle this challenging scenario by splitting the problem into two principal subtasks. First, we consider a pose conditioned bidirectional generator that maps back the initially rendered image to the original pose, hence being directly comparable to the input image without the need to resort to any training image. Second, we devise a novel loss function that incorporates content and style terms, and aims at producing images of high perceptual quality. Extensive experiments conducted on the DeepFashion dataset demonstrate that the images rendered by our model are very close in appearance to those obtained by fully supervised approaches.This work is supported in part by a
Google Faculty Research Award, by the Spanish Ministry of
Science and Innovation under projects HuMoUR TIN2017-
90086-R, ColRobTransp DPI2016-78957 and Mar´ıa de
Maeztu Seal of Excellence MDM-2016-0656; and by the
EU project AEROARMS ICT-2014-1-644271. We also
thank Nvidia for hardware donation under the GPU Grant
Program.Peer reviewe

Digital.CSIC

Unsupervised person image synthesis in arbitrary poses

© 2018 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting /republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other worksWe present a novel approach for synthesizing photo-realistic images of people in arbitrary poses using generative adversarial learning. Given an input image of a person and a desired pose represented by a 2D skeleton, our model renders the image of the same person under the new pose, synthesizing novel views of the parts visible in the input image and hallucinating those that are not seen. This problem has recently been addressed in a supervised manner, i.e., during training the ground truth images under the new poses are given to the network. We go beyond these approaches by proposing a fully unsupervised strategy. We tackle this challenging scenario by splitting the problem into two principal subtasks. First, we consider a pose conditioned bidirectional generator that maps back the initially rendered image to the original pose, hence being directly comparable to the input image without the need to resort to any training image. Second, we devise a novel loss function that incorporates content and style terms, and aims at producing images of high perceptual quality. Extensive experiments conducted on the DeepFashion dataset demonstrate that the images rendered by our model are very close in appearance to those obtained by fully supervised approaches.Peer ReviewedPostprint (author's final draft

Pumarola Peris, Albert

Agudo Martínez, Antonio

Sanfeliu Cortés, Alberto

UPCommons. Portal del coneixement obert de la UPC

© 2018 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting /republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other worksWe present a novel approach for synthesizing photo-realistic images of people in arbitrary poses using generative adversarial learning. Given an input image of a person and a desired pose represented by a 2D skeleton, our model renders the image of the same person under the new pose, synthesizing novel views of the parts visible in the input image and hallucinating those that are not seen. This problem has recently been addressed in a supervised manner, i.e., during training the ground truth images under the new poses are given to the network. We go beyond these approaches by proposing a fully unsupervised strategy. We tackle this challenging scenario by splitting the problem into two principal subtasks. First, we consider a pose conditioned bidirectional generator that maps back the initially rendered image to the original pose, hence being directly comparable to the input image without the need to resort to any training image. Second, we devise a novel loss function that incorporates content and style terms, and aims at producing images of high perceptual quality. Extensive experiments conducted on the DeepFashion dataset demonstrate that the images rendered by our model are very close in appearance to those obtained by fully supervised approaches.Peer Reviewe

Unsupervised Person Image Synthesis in Arbitrary Poses

Abstract

Similar works

Full text

Available Versions

Digital.CSIC

UPCommons. Portal del coneixement obert de la UPC

UPCommons

Crossref