5 research outputs found
Controllable Animation of Fluid Elements in Still Images
We propose a method to interactively control the animation of fluid elements
in still images to generate cinemagraphs. Specifically, we focus on the
animation of fluid elements like water, smoke, fire, which have the properties
of repeating textures and continuous fluid motion. Taking inspiration from
prior works, we represent the motion of such fluid elements in the image in the
form of a constant 2D optical flow map. To this end, we allow the user to
provide any number of arrow directions and their associated speeds along with a
mask of the regions the user wants to animate. The user-provided input arrow
directions, their corresponding speed values, and the mask are then converted
into a dense flow map representing a constant optical flow map (FD). We observe
that FD, obtained using simple exponential operations can closely approximate
the plausible motion of elements in the image. We further refine computed dense
optical flow map FD using a generative-adversarial network (GAN) to obtain a
more realistic flow map. We devise a novel UNet based architecture to
autoregressively generate future frames using the refined optical flow map by
forward-warping the input image features at different resolutions. We conduct
extensive experiments on a publicly available dataset and show that our method
is superior to the baselines in terms of qualitative and quantitative metrics.
In addition, we show the qualitative animations of the objects in directions
that did not exist in the training set and provide a way to synthesize videos
that otherwise would not exist in the real world
Synthesizing Artistic Cinemagraphs from Text
We introduce Artistic Cinemagraph, a fully automated method for creating
cinemagraphs from text descriptions - an especially challenging task when
prompts feature imaginary elements and artistic styles, given the complexity of
interpreting the semantics and motions of these images. Existing single-image
animation methods fall short on artistic inputs, and recent text-based video
methods frequently introduce temporal inconsistencies, struggling to keep
certain regions static. To address these challenges, we propose an idea of
synthesizing image twins from a single text prompt - a pair of an artistic
image and its pixel-aligned corresponding natural-looking twin. While the
artistic image depicts the style and appearance detailed in our text prompt,
the realistic counterpart greatly simplifies layout and motion analysis.
Leveraging existing natural image and video datasets, we can accurately segment
the realistic image and predict plausible motion given the semantic
information. The predicted motion can then be transferred to the artistic image
to create the final cinemagraph. Our method outperforms existing approaches in
creating cinemagraphs for natural landscapes as well as artistic and
other-worldly scenes, as validated by automated metrics and user studies.
Finally, we demonstrate two extensions: animating existing paintings and
controlling motion directions using text