Search CORE

3 research outputs found

Reference-based Image Composition with Sketch via Structure-aware Diffusion Model

Author: Choo Jaegul
Kim Kangyeol
Lee Junsoo
Park Sunghyun
Publication venue
Publication date: 31/03/2023
Field of study

Recent remarkable improvements in large-scale text-to-image generative models have shown promising results in generating high-fidelity images. To further enhance editability and enable fine-grained generation, we introduce a multi-input-conditioned image composition model that incorporates a sketch as a novel modal, alongside a reference image. Thanks to the edge-level controllability using sketches, our method enables a user to edit or complete an image sub-part with a desired structure (i.e., sketch) and content (i.e., reference image). Our framework fine-tunes a pre-trained diffusion model to complete missing regions using the reference image while maintaining sketch guidance. Albeit simple, this leads to wide opportunities to fulfill user needs for obtaining the in-demand images. Through extensive experiments, we demonstrate that our proposed method offers unique use cases for image manipulation, enabling user-driven modifications of arbitrary scenes.Comment: 7 pages; Code URL: https://github.com/kangyeolk/Paint-by-Sketc

arXiv.org e-Print Archive

Vid-ODE: Continuous-Time Video Generation with Neural Ordinary Differential Equation

Author: Choi Edward
Choo Jaegul
Kim Kangyeol
Kim Sookyung
Lee Joonseok
Lee Junsoo
Park Sunghyun
Publication venue
Publication date: 30/03/2021
Field of study

Video generation models often operate under the assumption of fixed frame rates, which leads to suboptimal performance when it comes to handling flexible frame rates (e.g., increasing the frame rate of the more dynamic portion of the video as well as handling missing video frames). To resolve the restricted nature of existing video generation models' ability to handle arbitrary timesteps, we propose continuous-time video generation by combining neural ODE (Vid-ODE) with pixel-level video processing techniques. Using ODE-ConvGRU as an encoder, a convolutional version of the recently proposed neural ODE, which enables us to learn continuous-time dynamics, Vid-ODE can learn the spatio-temporal dynamics of input videos of flexible frame rates. The decoder integrates the learned dynamics function to synthesize video frames at any given timesteps, where the pixel-level composition technique is used to maintain the sharpness of individual frames. With extensive experiments on four real-world video datasets, we verify that the proposed Vid-ODE outperforms state-of-the-art approaches under various video generation settings, both within the trained time range (interpolation) and beyond the range (extrapolation). To the best of our knowledge, Vid-ODE is the first work successfully performing continuous-time video generation using real-world videos.Comment: Accepted to AAAI 2021, 22 page

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications