3 research outputs found
Reference-based Image Composition with Sketch via Structure-aware Diffusion Model
Recent remarkable improvements in large-scale text-to-image generative models
have shown promising results in generating high-fidelity images. To further
enhance editability and enable fine-grained generation, we introduce a
multi-input-conditioned image composition model that incorporates a sketch as a
novel modal, alongside a reference image. Thanks to the edge-level
controllability using sketches, our method enables a user to edit or complete
an image sub-part with a desired structure (i.e., sketch) and content (i.e.,
reference image). Our framework fine-tunes a pre-trained diffusion model to
complete missing regions using the reference image while maintaining sketch
guidance. Albeit simple, this leads to wide opportunities to fulfill user needs
for obtaining the in-demand images. Through extensive experiments, we
demonstrate that our proposed method offers unique use cases for image
manipulation, enabling user-driven modifications of arbitrary scenes.Comment: 7 pages; Code URL: https://github.com/kangyeolk/Paint-by-Sketc
Vid-ODE: Continuous-Time Video Generation with Neural Ordinary Differential Equation
Video generation models often operate under the assumption of fixed frame
rates, which leads to suboptimal performance when it comes to handling flexible
frame rates (e.g., increasing the frame rate of the more dynamic portion of the
video as well as handling missing video frames). To resolve the restricted
nature of existing video generation models' ability to handle arbitrary
timesteps, we propose continuous-time video generation by combining neural ODE
(Vid-ODE) with pixel-level video processing techniques. Using ODE-ConvGRU as an
encoder, a convolutional version of the recently proposed neural ODE, which
enables us to learn continuous-time dynamics, Vid-ODE can learn the
spatio-temporal dynamics of input videos of flexible frame rates. The decoder
integrates the learned dynamics function to synthesize video frames at any
given timesteps, where the pixel-level composition technique is used to
maintain the sharpness of individual frames. With extensive experiments on four
real-world video datasets, we verify that the proposed Vid-ODE outperforms
state-of-the-art approaches under various video generation settings, both
within the trained time range (interpolation) and beyond the range
(extrapolation). To the best of our knowledge, Vid-ODE is the first work
successfully performing continuous-time video generation using real-world
videos.Comment: Accepted to AAAI 2021, 22 page