12 research outputs found
Scene restoration from scaffold occlusion using deep learning-based methods
The occlusion issues of computer vision (CV) applications in construction
have attracted significant attention, especially those caused by the
wide-coverage, crisscrossed, and immovable scaffold. Intuitively, removing the
scaffold and restoring the occluded visual information can provide CV agents
with clearer site views and thus help them better understand the construction
scenes. Therefore, this study proposes a novel two-step method combining
pixel-level segmentation and image inpainting for restoring construction scenes
from scaffold occlusion. A low-cost data synthesis method based only on
unlabeled data is developed to address the shortage dilemma of labeled data.
Experiments on the synthesized test data show that the proposed method achieves
performances of 92% mean intersection over union (MIoU) for scaffold
segmentation and over 82% structural similarity (SSIM) for scene restoration
from scaffold occlusion
Generative Modeling in Structural-Hankel Domain for Color Image Inpainting
In recent years, some researchers focused on using a single image to obtain a
large number of samples through multi-scale features. This study intends to a
brand-new idea that requires only ten or even fewer samples to construct the
low-rank structural-Hankel matrices-assisted score-based generative model
(SHGM) for color image inpainting task. During the prior learning process, a
certain amount of internal-middle patches are firstly extracted from several
images and then the structural-Hankel matrices are constructed from these
patches. To better apply the score-based generative model to learn the internal
statistical distribution within patches, the large-scale Hankel matrices are
finally folded into the higher dimensional tensors for prior learning. During
the iterative inpainting process, SHGM views the inpainting problem as a
conditional generation procedure in low-rank environment. As a result, the
intermediate restored image is acquired by alternatively performing the
stochastic differential equation solver, alternating direction method of
multipliers, and data consistency steps. Experimental results demonstrated the
remarkable performance and diversity of SHGM.Comment: 11 pages, 10 figure
Zoom-to-Inpaint: Image Inpainting with High-Frequency Details
Although deep learning has enabled a huge leap forward in image inpainting,
current methods are often unable to synthesize realistic high-frequency
details. In this paper, we propose applying super-resolution to coarsely
reconstructed outputs, refining them at high resolution, and then downscaling
the output to the original resolution. By introducing high-resolution images to
the refinement network, our framework is able to reconstruct finer details that
are usually smoothed out due to spectral bias - the tendency of neural networks
to reconstruct low frequencies better than high frequencies. To assist training
the refinement network on large upscaled holes, we propose a progressive
learning technique in which the size of the missing regions increases as
training progresses. Our zoom-in, refine and zoom-out strategy, combined with
high-resolution supervision and progressive learning, constitutes a
framework-agnostic approach for enhancing high-frequency details that can be
applied to any CNN-based inpainting method. We provide qualitative and
quantitative evaluations along with an ablation analysis to show the
effectiveness of our approach. This seemingly simple, yet powerful approach,
outperforms state-of-the-art inpainting methods
Unpaired Depth Super-Resolution in the Wild
Depth maps captured with commodity sensors are often of low quality and
resolution; these maps need to be enhanced to be used in many applications.
State-of-the-art data-driven methods of depth map super-resolution rely on
registered pairs of low- and high-resolution depth maps of the same scenes.
Acquisition of real-world paired data requires specialized setups. Another
alternative, generating low-resolution maps from high-resolution maps by
subsampling, adding noise and other artificial degradation methods, does not
fully capture the characteristics of real-world low-resolution images. As a
consequence, supervised learning methods trained on such artificial paired data
may not perform well on real-world low-resolution inputs. We consider an
approach to depth super-resolution based on learning from unpaired data. While
many techniques for unpaired image-to-image translation have been proposed,
most fail to deliver effective hole-filling or reconstruct accurate surfaces
using depth maps. We propose an unpaired learning method for depth
super-resolution, which is based on a learnable degradation model, enhancement
component and surface normal estimates as features to produce more accurate
depth maps. We propose a benchmark for unpaired depth SR and demonstrate that
our method outperforms existing unpaired methods and performs on par with
paired
Deep panoramic depth prediction and completion for indoor scenes
We introduce a novel end-to-end deep-learning solution for rapidly estimating a dense spherical depth map of an indoor environment. Our input is a single equirectangular image registered with a sparse depth map, as provided by a variety of common capture setups. Depth is inferred by an efficient and lightweight single-branch network, which employs a dynamic gating system to process together dense visual data and sparse geometric data. We exploit the characteristics of typical man-made environments to efficiently compress multi-resolution features and find short- and long-range relations among scene parts. Furthermore, we introduce a new augmentation strategy to make the model robust to different types of sparsity, including those generated by various structured light sensors and LiDAR setups. The experimental results demonstrate that our method provides interactive performance and outperforms state-of-the-art solutions in computational efficiency, adaptivity to variable depth sparsity patterns, and prediction accuracy for challenging indoor data, even when trained solely on synthetic data without any fine tuning. (Figure presented.