271 research outputs found

    Image Inpainting with Learnable Feature Imputation

    Full text link
    A regular convolution layer applying a filter in the same way over known and unknown areas causes visual artifacts in the inpainted image. Several studies address this issue with feature re-normalization on the output of the convolution. However, these models use a significant amount of learnable parameters for feature re-normalization, or assume a binary representation of the certainty of an output. We propose (layer-wise) feature imputation of the missing input values to a convolution. In contrast to learned feature re-normalization, our method is efficient and introduces a minimal number of parameters. Furthermore, we propose a revised gradient penalty for image inpainting, and a novel GAN architecture trained exclusively on adversarial loss. Our quantitative evaluation on the FDF dataset reflects that our revised gradient penalty and alternative convolution improves generated image quality significantly. We present comparisons on CelebA-HQ and Places2 to current state-of-the-art to validate our model

    Delving Globally into Texture and Structure for Image Inpainting

    Full text link
    Image inpainting has achieved remarkable progress and inspired abundant methods, where the critical bottleneck is identified as how to fulfill the high-frequency structure and low-frequency texture information on the masked regions with semantics. To this end, deep models exhibit powerful superiority to capture them, yet constrained on the local spatial regions. In this paper, we delve globally into texture and structure information to well capture the semantics for image inpainting. As opposed to the existing arts trapped on the independent local patches, the texture information of each patch is reconstructed from all other patches across the whole image, to match the coarsely filled information, specially the structure information over the masked regions. Unlike the current decoder-only transformer within the pixel level for image inpainting, our model adopts the transformer pipeline paired with both encoder and decoder. On one hand, the encoder captures the texture semantic correlations of all patches across image via self-attention module. On the other hand, an adaptive patch vocabulary is dynamically established in the decoder for the filled patches over the masked regions. Building on this, a structure-texture matching attention module anchored on the known regions comes up to marry the best of these two worlds for progressive inpainting via a probabilistic diffusion process. Our model is orthogonal to the fashionable arts, such as Convolutional Neural Networks (CNNs), Attention and Transformer model, from the perspective of texture and structure information for image inpainting. The extensive experiments over the benchmarks validate its superiority. Our code is available at https://github.com/htyjers/DGTS-Inpainting.Comment: 9 pages, 10 figures, accepted by ACM Multimedia 202

    DIFAI: Diverse Facial Inpainting using StyleGAN Inversion

    Full text link
    Image inpainting is an old problem in computer vision that restores occluded regions and completes damaged images. In the case of facial image inpainting, most of the methods generate only one result for each masked image, even though there are other reasonable possibilities. To prevent any potential biases and unnatural constraints stemming from generating only one image, we propose a novel framework for diverse facial inpainting exploiting the embedding space of StyleGAN. Our framework employs pSp encoder and SeFa algorithm to identify semantic components of the StyleGAN embeddings and feed them into our proposed SPARN decoder that adopts region normalization for plausible inpainting. We demonstrate that our proposed method outperforms several state-of-the-art methods.Comment: ICIP 202

    Windowed-FourierMixer: Enhancing Clutter-Free Room Modeling with Fourier Transform

    Full text link
    With the growing demand for immersive digital applications, the need to understand and reconstruct 3D scenes has significantly increased. In this context, inpainting indoor environments from a single image plays a crucial role in modeling the internal structure of interior spaces as it enables the creation of textured and clutter-free reconstructions. While recent methods have shown significant progress in room modeling, they rely on constraining layout estimators to guide the reconstruction process. These methods are highly dependent on the performance of the structure estimator and its generative ability in heavily occluded environments. In response to these issues, we propose an innovative approach based on a U-Former architecture and a new Windowed-FourierMixer block, resulting in a unified, single-phase network capable of effectively handle human-made periodic structures such as indoor spaces. This new architecture proves advantageous for tasks involving indoor scenes where symmetry is prevalent, allowing the model to effectively capture features such as horizon/ceiling height lines and cuboid-shaped rooms. Experiments show the proposed approach outperforms current state-of-the-art methods on the Structured3D dataset demonstrating superior performance in both quantitative metrics and qualitative results. Code and models will be made publicly available

    MASK FACE INPAINTING BASED ON IMPROVED GENERATIVE ADVERSARIAL NETWORK

    Get PDF
    Face recognition technology has been widely used in all aspects of people's lives. However, the accuracy of face recognition is greatly reduced due to the obscuring of objects, such as masks and sunglasses. Wearing masks in public has been a crucial approach to preventing illness, especially since the Covid-19 outbreak. This poses challenges to applications such as face recognition. Therefore, the removal of masks via image inpainting has become a hot topic in the field of computer vision. Deep learning-based image inpainting techniques have taken observable results, but the restored images still have problems such as blurring and inconsistency. To address such problems, this paper proposes an improved inpainting model based on generative adversarial network: the model adds attention mechanisms to the sampling module based on pix2pix network; the residual module is improved by adding convolutional branches. The improved inpainting model can not only effectively restore faces obscured by face masks, but also realize the inpainting of randomly obscured images of human faces. To further validate the generality of the inpainting model, tests are conducted on the datasets of CelebA, Paris Street and Place2, and the experimental results show that both SSIM and PSNR have improved significantly

    A Highlight Removal Method for Capsule Endoscopy Images

    Full text link
    The images captured by Wireless Capsule Endoscopy (WCE) always exhibit specular reflections, and removing highlights while preserving the color and texture in the region remains a challenge. To address this issue, this paper proposes a highlight removal method for capsule endoscopy images. Firstly, the confidence and feature terms of the highlight region's edges are computed, where confidence is obtained by the ratio of known pixels in the RGB space's R channel to the B channel within a window centered on the highlight region's edge pixel, and feature terms are acquired by multiplying the gradient vector of the highlight region's edge pixel with the iso-intensity line. Subsequently, the confidence and feature terms are assigned different weights and summed to obtain the priority of all highlight region's edge pixels, and the pixel with the highest priority is identified. Then, the variance of the highlight region's edge pixels is used to adjust the size of the sample block window, and the best-matching block is searched in the known region based on the RGB color similarity and distance between the sample block and the window centered on the pixel with the highest priority. Finally, the pixels in the best-matching block are copied to the highest priority highlight removal region to achieve the goal of removing the highlight region. Experimental results demonstrate that the proposed method effectively removes highlights from WCE images, with a lower coefficient of variation in the highlight removal region compared to the Crinimisi algorithm and DeepGin method. Additionally, the color and texture in the highlight removal region are similar to those in the surrounding areas, and the texture is continuous

    FlowLens: Seeing Beyond the FoV via Flow-guided Clip-Recurrent Transformer

    Full text link
    Limited by hardware cost and system size, camera's Field-of-View (FoV) is not always satisfactory. However, from a spatio-temporal perspective, information beyond the camera's physical FoV is off-the-shelf and can actually be obtained "for free" from the past. In this paper, we propose a novel task termed Beyond-FoV Estimation, aiming to exploit past visual cues and bidirectional break through the physical FoV of a camera. We put forward a FlowLens architecture to expand the FoV by achieving feature propagation explicitly by optical flow and implicitly by a novel clip-recurrent transformer, which has two appealing features: 1) FlowLens comprises a newly proposed Clip-Recurrent Hub with 3D-Decoupled Cross Attention (DDCA) to progressively process global information accumulated in the temporal dimension. 2) A multi-branch Mix Fusion Feed Forward Network (MixF3N) is integrated to enhance the spatially-precise flow of local features. To foster training and evaluation, we establish KITTI360-EX, a dataset for outer- and inner FoV expansion. Extensive experiments on both video inpainting and beyond-FoV estimation tasks show that FlowLens achieves state-of-the-art performance. Code will be made publicly available at https://github.com/MasterHow/FlowLens.Comment: Code will be made publicly available at https://github.com/MasterHow/FlowLen
    corecore