271 research outputs found
Image Inpainting with Learnable Feature Imputation
A regular convolution layer applying a filter in the same way over known and
unknown areas causes visual artifacts in the inpainted image. Several studies
address this issue with feature re-normalization on the output of the
convolution. However, these models use a significant amount of learnable
parameters for feature re-normalization, or assume a binary representation of
the certainty of an output. We propose (layer-wise) feature imputation of the
missing input values to a convolution. In contrast to learned feature
re-normalization, our method is efficient and introduces a minimal number of
parameters. Furthermore, we propose a revised gradient penalty for image
inpainting, and a novel GAN architecture trained exclusively on adversarial
loss. Our quantitative evaluation on the FDF dataset reflects that our revised
gradient penalty and alternative convolution improves generated image quality
significantly. We present comparisons on CelebA-HQ and Places2 to current
state-of-the-art to validate our model
Delving Globally into Texture and Structure for Image Inpainting
Image inpainting has achieved remarkable progress and inspired abundant
methods, where the critical bottleneck is identified as how to fulfill the
high-frequency structure and low-frequency texture information on the masked
regions with semantics. To this end, deep models exhibit powerful superiority
to capture them, yet constrained on the local spatial regions. In this paper,
we delve globally into texture and structure information to well capture the
semantics for image inpainting. As opposed to the existing arts trapped on the
independent local patches, the texture information of each patch is
reconstructed from all other patches across the whole image, to match the
coarsely filled information, specially the structure information over the
masked regions. Unlike the current decoder-only transformer within the pixel
level for image inpainting, our model adopts the transformer pipeline paired
with both encoder and decoder. On one hand, the encoder captures the texture
semantic correlations of all patches across image via self-attention module. On
the other hand, an adaptive patch vocabulary is dynamically established in the
decoder for the filled patches over the masked regions. Building on this, a
structure-texture matching attention module anchored on the known regions comes
up to marry the best of these two worlds for progressive inpainting via a
probabilistic diffusion process. Our model is orthogonal to the fashionable
arts, such as Convolutional Neural Networks (CNNs), Attention and Transformer
model, from the perspective of texture and structure information for image
inpainting. The extensive experiments over the benchmarks validate its
superiority. Our code is available at
https://github.com/htyjers/DGTS-Inpainting.Comment: 9 pages, 10 figures, accepted by ACM Multimedia 202
DIFAI: Diverse Facial Inpainting using StyleGAN Inversion
Image inpainting is an old problem in computer vision that restores occluded
regions and completes damaged images. In the case of facial image inpainting,
most of the methods generate only one result for each masked image, even though
there are other reasonable possibilities. To prevent any potential biases and
unnatural constraints stemming from generating only one image, we propose a
novel framework for diverse facial inpainting exploiting the embedding space of
StyleGAN. Our framework employs pSp encoder and SeFa algorithm to identify
semantic components of the StyleGAN embeddings and feed them into our proposed
SPARN decoder that adopts region normalization for plausible inpainting. We
demonstrate that our proposed method outperforms several state-of-the-art
methods.Comment: ICIP 202
Windowed-FourierMixer: Enhancing Clutter-Free Room Modeling with Fourier Transform
With the growing demand for immersive digital applications, the need to
understand and reconstruct 3D scenes has significantly increased. In this
context, inpainting indoor environments from a single image plays a crucial
role in modeling the internal structure of interior spaces as it enables the
creation of textured and clutter-free reconstructions. While recent methods
have shown significant progress in room modeling, they rely on constraining
layout estimators to guide the reconstruction process. These methods are highly
dependent on the performance of the structure estimator and its generative
ability in heavily occluded environments. In response to these issues, we
propose an innovative approach based on a U-Former architecture and a new
Windowed-FourierMixer block, resulting in a unified, single-phase network
capable of effectively handle human-made periodic structures such as indoor
spaces. This new architecture proves advantageous for tasks involving indoor
scenes where symmetry is prevalent, allowing the model to effectively capture
features such as horizon/ceiling height lines and cuboid-shaped rooms.
Experiments show the proposed approach outperforms current state-of-the-art
methods on the Structured3D dataset demonstrating superior performance in both
quantitative metrics and qualitative results. Code and models will be made
publicly available
MASK FACE INPAINTING BASED ON IMPROVED GENERATIVE ADVERSARIAL NETWORK
Face recognition technology has been widely used in all aspects of people's lives. However, the accuracy of face recognition is greatly reduced due to the obscuring of objects, such as masks and sunglasses. Wearing masks in public has been a crucial approach to preventing illness, especially since the Covid-19 outbreak. This poses challenges to applications such as face recognition. Therefore, the removal of masks via image inpainting has become a hot topic in the field of computer vision. Deep learning-based image inpainting techniques have taken observable results, but the restored images still have problems such as blurring and inconsistency. To address such problems, this paper proposes an improved inpainting model based on generative adversarial network: the model adds attention mechanisms to the sampling module based on pix2pix network; the residual module is improved by adding convolutional branches. The improved inpainting model can not only effectively restore faces obscured by face masks, but also realize the inpainting of randomly obscured images of human faces. To further validate the generality of the inpainting model, tests are conducted on the datasets of CelebA, Paris Street and Place2, and the experimental results show that both SSIM and PSNR have improved significantly
A Highlight Removal Method for Capsule Endoscopy Images
The images captured by Wireless Capsule Endoscopy (WCE) always exhibit
specular reflections, and removing highlights while preserving the color and
texture in the region remains a challenge. To address this issue, this paper
proposes a highlight removal method for capsule endoscopy images. Firstly, the
confidence and feature terms of the highlight region's edges are computed,
where confidence is obtained by the ratio of known pixels in the RGB space's R
channel to the B channel within a window centered on the highlight region's
edge pixel, and feature terms are acquired by multiplying the gradient vector
of the highlight region's edge pixel with the iso-intensity line. Subsequently,
the confidence and feature terms are assigned different weights and summed to
obtain the priority of all highlight region's edge pixels, and the pixel with
the highest priority is identified. Then, the variance of the highlight
region's edge pixels is used to adjust the size of the sample block window, and
the best-matching block is searched in the known region based on the RGB color
similarity and distance between the sample block and the window centered on the
pixel with the highest priority. Finally, the pixels in the best-matching block
are copied to the highest priority highlight removal region to achieve the goal
of removing the highlight region. Experimental results demonstrate that the
proposed method effectively removes highlights from WCE images, with a lower
coefficient of variation in the highlight removal region compared to the
Crinimisi algorithm and DeepGin method. Additionally, the color and texture in
the highlight removal region are similar to those in the surrounding areas, and
the texture is continuous
FlowLens: Seeing Beyond the FoV via Flow-guided Clip-Recurrent Transformer
Limited by hardware cost and system size, camera's Field-of-View (FoV) is not
always satisfactory. However, from a spatio-temporal perspective, information
beyond the camera's physical FoV is off-the-shelf and can actually be obtained
"for free" from the past. In this paper, we propose a novel task termed
Beyond-FoV Estimation, aiming to exploit past visual cues and bidirectional
break through the physical FoV of a camera. We put forward a FlowLens
architecture to expand the FoV by achieving feature propagation explicitly by
optical flow and implicitly by a novel clip-recurrent transformer, which has
two appealing features: 1) FlowLens comprises a newly proposed Clip-Recurrent
Hub with 3D-Decoupled Cross Attention (DDCA) to progressively process global
information accumulated in the temporal dimension. 2) A multi-branch Mix Fusion
Feed Forward Network (MixF3N) is integrated to enhance the spatially-precise
flow of local features. To foster training and evaluation, we establish
KITTI360-EX, a dataset for outer- and inner FoV expansion. Extensive
experiments on both video inpainting and beyond-FoV estimation tasks show that
FlowLens achieves state-of-the-art performance. Code will be made publicly
available at https://github.com/MasterHow/FlowLens.Comment: Code will be made publicly available at
https://github.com/MasterHow/FlowLen
- …