119 research outputs found
NeRF-Enhanced Outpainting for Faithful Field-of-View Extrapolation
In various applications, such as robotic navigation and remote visual
assistance, expanding the field of view (FOV) of the camera proves beneficial
for enhancing environmental perception. Unlike image outpainting techniques
aimed solely at generating aesthetically pleasing visuals, these applications
demand an extended view that faithfully represents the scene. To achieve this,
we formulate a new problem of faithful FOV extrapolation that utilizes a set of
pre-captured images as prior knowledge of the scene. To address this problem,
we present a simple yet effective solution called NeRF-Enhanced Outpainting
(NEO) that uses extended-FOV images generated through NeRF to train a
scene-specific image outpainting model. To assess the performance of NEO, we
conduct comprehensive evaluations on three photorealistic datasets and one
real-world dataset. Extensive experiments on the benchmark datasets showcase
the robustness and potential of our method in addressing this challenge. We
believe our work lays a strong foundation for future exploration within the
research community
Outpainting by Queries
Image outpainting, which is well studied with Convolution Neural Network
(CNN) based framework, has recently drawn more attention in computer vision.
However, CNNs rely on inherent inductive biases to achieve effective sample
learning, which may degrade the performance ceiling. In this paper, motivated
by the flexible self-attention mechanism with minimal inductive biases in
transformer architecture, we reframe the generalised image outpainting problem
as a patch-wise sequence-to-sequence autoregression problem, enabling
query-based image outpainting. Specifically, we propose a novel hybrid
vision-transformer-based encoder-decoder framework, named \textbf{Query}
\textbf{O}utpainting \textbf{TR}ansformer (\textbf{QueryOTR}), for
extrapolating visual context all-side around a given image. Patch-wise mode's
global modeling capacity allows us to extrapolate images from the attention
mechanism's query standpoint. A novel Query Expansion Module (QEM) is designed
to integrate information from the predicted queries based on the encoder's
output, hence accelerating the convergence of the pure transformer even with a
relatively small dataset. To further enhance connectivity between each patch,
the proposed Patch Smoothing Module (PSM) re-allocates and averages the
overlapped regions, thus providing seamless predicted images. We experimentally
show that QueryOTR could generate visually appealing results smoothly and
realistically against the state-of-the-art image outpainting approaches
IPO-LDM: Depth-aided 360-degree Indoor RGB Panorama Outpainting via Latent Diffusion Model
Generating complete 360-degree panoramas from narrow field of view images is
ongoing research as omnidirectional RGB data is not readily available. Existing
GAN-based approaches face some barriers to achieving higher quality output, and
have poor generalization performance over different mask types. In this paper,
we present our 360-degree indoor RGB panorama outpainting model using latent
diffusion models (LDM), called IPO-LDM. We introduce a new bi-modal latent
diffusion structure that utilizes both RGB and depth panoramic data during
training, but works surprisingly well to outpaint normal depth-free RGB images
during inference. We further propose a novel technique of introducing
progressive camera rotations during each diffusion denoising step, which leads
to substantial improvement in achieving panorama wraparound consistency.
Results show that our IPO-LDM not only significantly outperforms
state-of-the-art methods on RGB panorama outpainting, but can also produce
multiple and diverse well-structured results for different types of masks
Hierarchical Masked 3D Diffusion Model for Video Outpainting
Video outpainting aims to adequately complete missing areas at the edges of
video frames. Compared to image outpainting, it presents an additional
challenge as the model should maintain the temporal consistency of the filled
area. In this paper, we introduce a masked 3D diffusion model for video
outpainting. We use the technique of mask modeling to train the 3D diffusion
model. This allows us to use multiple guide frames to connect the results of
multiple video clip inferences, thus ensuring temporal consistency and reducing
jitter between adjacent frames. Meanwhile, we extract the global frames of the
video as prompts and guide the model to obtain information other than the
current video clip using cross-attention. We also introduce a hybrid
coarse-to-fine inference pipeline to alleviate the artifact accumulation
problem. The existing coarse-to-fine pipeline only uses the infilling strategy,
which brings degradation because the time interval of the sparse frames is too
large. Our pipeline benefits from bidirectional learning of the mask modeling
and thus can employ a hybrid strategy of infilling and interpolation when
generating sparse frames. Experiments show that our method achieves
state-of-the-art results in video outpainting tasks. More results are provided
at our https://fanfanda.github.io/M3DDM/.Comment: ACM MM 2023 accepte
PaintSeg: Training-free Segmentation via Painting
The paper introduces PaintSeg, a new unsupervised method for segmenting
objects without any training. We propose an adversarial masked contrastive
painting (AMCP) process, which creates a contrast between the original image
and a painted image in which a masked area is painted using off-the-shelf
generative models. During the painting process, inpainting and outpainting are
alternated, with the former masking the foreground and filling in the
background, and the latter masking the background while recovering the missing
part of the foreground object. Inpainting and outpainting, also referred to as
I-step and O-step, allow our method to gradually advance the target
segmentation mask toward the ground truth without supervision or training.
PaintSeg can be configured to work with a variety of prompts, e.g. coarse
masks, boxes, scribbles, and points. Our experimental results demonstrate that
PaintSeg outperforms existing approaches in coarse mask-prompt, box-prompt, and
point-prompt segmentation tasks, providing a training-free solution suitable
for unsupervised segmentation
Towards Reliable Image Outpainting: Learning Structure-Aware Multimodal Fusion with Depth Guidance
Image outpainting technology generates visually plausible content regardless
of authenticity, making it unreliable to be applied in practice. Thus, we
propose a reliable image outpainting task, introducing the sparse depth from
LiDARs to extrapolate authentic RGB scenes. The large field view of LiDARs
allows it to serve for data enhancement and further multimodal tasks.
Concretely, we propose a Depth-Guided Outpainting Network to model different
feature representations of two modalities and learn the structure-aware
cross-modal fusion. And two components are designed: 1) The Multimodal Learning
Module produces unique depth and RGB feature representations from the
perspectives of different modal characteristics. 2) The Depth Guidance Fusion
Module leverages the complete depth modality to guide the establishment of RGB
contents by progressive multimodal feature fusion. Furthermore, we specially
design an additional constraint strategy consisting of Cross-modal Loss and
Edge Loss to enhance ambiguous contours and expedite reliable content
generation. Extensive experiments on KITTI and Waymo datasets demonstrate our
superiority over the state-of-the-art method, quantitatively and qualitatively
- …