We offer a new perspective on approaching the task of video generation.
Instead of directly synthesizing a sequence of frames, we propose to render a
video by warping one static image with a generative deformation field (GenDeF).
Such a pipeline enjoys three appealing advantages. First, we can sufficiently
reuse a well-trained image generator to synthesize the static image (also
called canonical image), alleviating the difficulty in producing a video and
thereby resulting in better visual quality. Second, we can easily convert a
deformation field to optical flows, making it possible to apply explicit
structural regularizations for motion modeling, leading to temporally
consistent results. Third, the disentanglement between content and motion
allows users to process a synthesized video through processing its
corresponding static image without any tuning, facilitating many applications
like video editing, keypoint tracking, and video segmentation. Both qualitative
and quantitative results on three common video generation benchmarks
demonstrate the superiority of our GenDeF method.Comment: Project page: https://aim-uofa.github.io/GenDeF