We attempt for the first time to address the problem of gait transfer. In
contrast to motion transfer, the objective here is not to imitate the source's
normal motions, but rather to transform the source's motion into a typical gait
pattern for the target. Using gait recognition models, we demonstrate that
existing techniques yield a discrepancy that can be easily detected. We
introduce a novel model, Cycle Transformers GAN (CTrGAN), that can successfully
generate the target's natural gait. CTrGAN's generators consist of a decoder
and encoder, both Transformers, where the attention is on the temporal domain
between complete images rather than the spatial domain between patches. While
recent Transformer studies in computer vision mainly focused on discriminative
tasks, we introduce an architecture that can be applied to synthesis tasks.
Using a widely-used gait recognition dataset, we demonstrate that our approach
is capable of producing over an order of magnitude more realistic personalized
gaits than existing methods, even when used with sources that were not
available during training