Real-world objects perform complex motions that involve multiple independent
motion components. For example, while talking, a person continuously changes
their expressions, head, and body pose. In this work, we propose a novel method
to decompose motion in videos by using a pretrained image GAN model. We
discover disentangled motion subspaces in the latent space of widely used
style-based GAN models that are semantically meaningful and control a single
explainable motion component. The proposed method uses only a few (β10)
ground truth video sequences to obtain such subspaces. We extensively evaluate
the disentanglement properties of motion subspaces on face and car datasets,
quantitatively and qualitatively. Further, we present results for multiple
downstream tasks such as motion editing, and selective motion transfer, e.g.
transferring only facial expressions without training for it.Comment: AI for content creation, CVPRW-202