6 research outputs found
Sketch-Guided Scenery Image Outpainting
The outpainting results produced by existing approaches are often too random
to meet users' requirement. In this work, we take the image outpainting one
step forward by allowing users to harvest personal custom outpainting results
using sketches as the guidance. To this end, we propose an encoder-decoder
based network to conduct sketch-guided outpainting, where two alignment modules
are adopted to impose the generated content to be realistic and consistent with
the provided sketches. First, we apply a holistic alignment module to make the
synthesized part be similar to the real one from the global view. Second, we
reversely produce the sketches from the synthesized part and encourage them be
consistent with the ground-truth ones using a sketch alignment module. In this
way, the learned generator will be imposed to pay more attention to fine
details and be sensitive to the guiding sketches. To our knowledge, this work
is the first attempt to explore the challenging yet meaningful conditional
scenery image outpainting. We conduct extensive experiments on two collected
benchmarks to qualitatively and quantitatively validate the effectiveness of
our approach compared with the other state-of-the-art generative models.Comment: Accepted by TI
Very Long Natural Scenery Image Prediction by Outpainting
Comparing to image inpainting, image outpainting receives less attention due
to two challenges in it. The first challenge is how to keep the spatial and
content consistency between generated images and original input. The second
challenge is how to maintain high quality in generated results, especially for
multi-step generations in which generated regions are spatially far away from
the initial input. To solve the two problems, we devise some innovative
modules, named Skip Horizontal Connection and Recurrent Content Transfer, and
integrate them into our designed encoder-decoder structure. By this design, our
network can generate highly realistic outpainting prediction effectively and
efficiently. Other than that, our method can generate new images with very long
sizes while keeping the same style and semantic content as the given input. To
test the effectiveness of the proposed architecture, we collect a new scenery
dataset with diverse, complicated natural scenes. The experimental results on
this dataset have demonstrated the efficacy of our proposed network. The code
and dataset are available from https://github.com/z-x-yang/NS-Outpainting.Comment: ICCV-1
A Unified Prompt-Guided In-Context Inpainting Framework for Reference-based Image Manipulations
Recent advancements in Text-to-Image (T2I) generative models have yielded
impressive results in generating high-fidelity images based on consistent text
prompts. However, there is a growing interest in exploring the potential of
these models for more diverse reference-based image manipulation tasks that
require spatial understanding and visual context. Previous approaches have
achieved this by incorporating additional control modules or fine-tuning the
generative models specifically for each task until convergence. In this paper,
we propose a different perspective. We conjecture that current large-scale T2I
generative models already possess the capability to perform these tasks but are
not fully activated within the standard generation process. To unlock these
capabilities, we introduce a unified Prompt-Guided In-Context inpainting (PGIC)
framework, which leverages large-scale T2I models to re-formulate and solve
reference-guided image manipulations. In the PGIC framework, the reference and
masked target are stitched together as a new input for the generative models,
enabling the filling of masked regions as producing final results. Furthermore,
we demonstrate that the self-attention modules in T2I models are well-suited
for establishing spatial correlations and efficiently addressing challenging
reference-guided manipulations. These large T2I models can be effectively
driven by task-specific prompts with minimal training cost or even with frozen
backbones. We synthetically evaluate the effectiveness of the proposed PGIC
framework across various tasks, including reference-guided image inpainting,
faithful inpainting, outpainting, local super-resolution, and novel view
synthesis. Our results show that PGIC achieves significantly better performance
while requiring less computation compared to other fine-tuning based
approaches
Towards Reliable Image Outpainting: Learning Structure-Aware Multimodal Fusion with Depth Guidance
Image outpainting technology generates visually plausible content regardless
of authenticity, making it unreliable to be applied in practice. Thus, we
propose a reliable image outpainting task, introducing the sparse depth from
LiDARs to extrapolate authentic RGB scenes. The large field view of LiDARs
allows it to serve for data enhancement and further multimodal tasks.
Concretely, we propose a Depth-Guided Outpainting Network to model different
feature representations of two modalities and learn the structure-aware
cross-modal fusion. And two components are designed: 1) The Multimodal Learning
Module produces unique depth and RGB feature representations from the
perspectives of different modal characteristics. 2) The Depth Guidance Fusion
Module leverages the complete depth modality to guide the establishment of RGB
contents by progressive multimodal feature fusion. Furthermore, we specially
design an additional constraint strategy consisting of Cross-modal Loss and
Edge Loss to enhance ambiguous contours and expedite reliable content
generation. Extensive experiments on KITTI and Waymo datasets demonstrate our
superiority over the state-of-the-art method, quantitatively and qualitatively
Cylin-Painting: Seamless 360{\deg} Panoramic Image Outpainting and Beyond with Cylinder-Style Convolutions
Image outpainting gains increasing attention since it can generate the
complete scene from a partial view, providing a valuable solution to construct
360{\deg} panoramic images. As image outpainting suffers from the intrinsic
issue of unidirectional completion flow, previous methods convert the original
problem into inpainting, which allows a bidirectional flow. However, we find
that inpainting has its own limitations and is inferior to outpainting in
certain situations. The question of how they may be combined for the best of
both has as yet remained under-explored. In this paper, we provide a deep
analysis of the differences between inpainting and outpainting, which
essentially depends on how the source pixels contribute to the unknown regions
under different spatial arrangements. Motivated by this analysis, we present a
Cylin-Painting framework that involves meaningful collaborations between
inpainting and outpainting and efficiently fuses the different arrangements,
with a view to leveraging their complementary benefits on a consistent and
seamless cylinder. Nevertheless, directly applying the cylinder-style
convolution often generates visually unpleasing results as it could discard
important positional information. To address this issue, we further present a
learnable positional embedding strategy and incorporate the missing component
of positional encoding into the cylinder convolution, which significantly
improves the panoramic results. Note that while developed for image
outpainting, the proposed solution can be effectively extended to other
panoramic vision tasks, such as object detection, depth estimation, and image
super resolution
Sketch-Guided Scenery Image Outpainting
The outpainting results produced by existing approaches are often too random to meet users' requirements. In this work, we take the image outpainting one step forward by allowing users to harvest personal custom outpainting results using sketches as the guidance. To this end, we propose an encoder-decoder based network to conduct sketch-guided outpainting, where two alignment modules are adopted to impose the generated content to be realistic and consistent with the provided sketches. First, we apply a holistic alignment module to make the synthesized part be similar to the real one from the global view. Second, we reversely produce the sketches from the synthesized part and encourage them be consistent with the ground-truth ones using a sketch alignment module. In this way, the learned generator will be imposed to pay more attention to fine details and be sensitive to the guiding sketches. To our knowledge, this work is the first attempt to explore the challenging yet meaningful conditional scenery image outpainting. We conduct extensive experiments on two collected benchmarks to qualitatively and quantitatively validate the effectiveness of our approach compared with the other state-of-the-art generative models