We present a deformable generator model to disentangle the appearance and
geometric information for both image and video data in a purely unsupervised
manner. The appearance generator network models the information related to
appearance, including color, illumination, identity or category, while the
geometric generator performs geometric warping, such as rotation and
stretching, through generating deformation field which is used to warp the
generated appearance to obtain the final image or video sequences. Two
generators take independent latent vectors as input to disentangle the
appearance and geometric information from image or video sequences. For video
data, a nonlinear transition model is introduced to both the appearance and
geometric generators to capture the dynamics over time. The proposed scheme is
general and can be easily integrated into different generative models. An
extensive set of qualitative and quantitative experiments shows that the
appearance and geometric information can be well disentangled, and the learned
geometric generator can be conveniently transferred to other image datasets to
facilitate knowledge transfer tasks.Comment: version