We present a novel approach for synthesizing photo-realistic images of people
in arbitrary poses using generative adversarial learning. Given an input image
of a person and a desired pose represented by a 2D skeleton, our model renders
the image of the same person under the new pose, synthesizing novel views of
the parts visible in the input image and hallucinating those that are not seen.
This problem has recently been addressed in a supervised manner, i.e., during
training the ground truth images under the new poses are given to the network.
We go beyond these approaches by proposing a fully unsupervised strategy. We
tackle this challenging scenario by splitting the problem into two principal
subtasks. First, we consider a pose conditioned bidirectional generator that
maps back the initially rendered image to the original pose, hence being
directly comparable to the input image without the need to resort to any
training image. Second, we devise a novel loss function that incorporates
content and style terms, and aims at producing images of high perceptual
quality. Extensive experiments conducted on the DeepFashion dataset demonstrate
that the images rendered by our model are very close in appearance to those
obtained by fully supervised approaches.Comment: Accepted as Spotlight at CVPR 201