Cross-view image translation is challenging because it involves images with
drastically different views and severe deformation. In this paper, we propose a
novel approach named Multi-Channel Attention SelectionGAN (SelectionGAN) that
makes it possible to generate images of natural scenes in arbitrary viewpoints,
based on an image of the scene and a novel semantic map. The proposed
SelectionGAN explicitly utilizes the semantic information and consists of two
stages. In the first stage, the condition image and the target semantic map are
fed into a cycled semantic-guided generation network to produce initial coarse
results. In the second stage, we refine the initial results by using a
multi-channel attention selection mechanism. Moreover, uncertainty maps
automatically learned from attentions are used to guide the pixel loss for
better network optimization. Extensive experiments on Dayton, CVUSA and Ego2Top
datasets show that our model is able to generate significantly better results
than the state-of-the-art methods. The source code, data and trained models are
available at https://github.com/Ha0Tang/SelectionGAN.Comment: 20 pages, 16 figures, accepted to CVPR 2019 as an oral pape