21 research outputs found
Deep Mean-Shift Priors for Image Restoration
In this paper we introduce a natural image prior that directly represents a
Gaussian-smoothed version of the natural image distribution. We include our
prior in a formulation of image restoration as a Bayes estimator that also
allows us to solve noise-blind image restoration problems. We show that the
gradient of our prior corresponds to the mean-shift vector on the natural image
distribution. In addition, we learn the mean-shift vector field using denoising
autoencoders, and use it in a gradient descent approach to perform Bayes risk
minimization. We demonstrate competitive results for noise-blind deblurring,
super-resolution, and demosaicing.Comment: NIPS 201
Masked Vision-Language Transformers for Scene Text Recognition
Scene text recognition (STR) enables computers to recognize and read the text
in various real-world scenes. Recent STR models benefit from taking linguistic
information in addition to visual cues into consideration. We propose a novel
Masked Vision-Language Transformers (MVLT) to capture both the explicit and the
implicit linguistic information. Our encoder is a Vision Transformer, and our
decoder is a multi-modal Transformer. MVLT is trained in two stages: in the
first stage, we design a STR-tailored pretraining method based on a masking
strategy; in the second stage, we fine-tune our model and adopt an iterative
correction method to improve the performance. MVLT attains superior results
compared to state-of-the-art STR models on several benchmarks. Our code and
model are available at https://github.com/onealwj/MVLT.Comment: The paper is accepted by the 33rd British Machine Vision Conference
(BMVC 2022
CPO: Change Robust Panorama to Point Cloud Localization
We present CPO, a fast and robust algorithm that localizes a 2D panorama with
respect to a 3D point cloud of a scene possibly containing changes. To robustly
handle scene changes, our approach deviates from conventional feature point
matching, and focuses on the spatial context provided from panorama images.
Specifically, we propose efficient color histogram generation and subsequent
robust localization using score maps. By utilizing the unique equivariance of
spherical projections, we propose very fast color histogram generation for a
large number of camera poses without explicitly rendering images for all
candidate poses. We accumulate the regional consistency of the panorama and
point cloud as 2D/3D score maps, and use them to weigh the input color values
to further increase robustness. The weighted color distribution quickly finds
good initial poses and achieves stable convergence for gradient-based
optimization. CPO is lightweight and achieves effective localization in all
tested scenarios, showing stable performance despite scene changes, repetitive
structures, or featureless regions, which are typical challenges for visual
localization with perspective cameras.Comment: Accepted to ECCV 202
RELLISUR: A Real Low-Light Image Super-Resolution Dataset
The RELLISUR dataset contains real low-light low-resolution images paired with normal-light high-resolution reference image counterparts. This dataset aims to fill the gap between low-light image enhancement and low-resolution image enhancement (Super-Resolution (SR)) which is currently only being addressed separately in the literature, even though the visibility of real-world images is often limited by both low-light and low-resolution. The dataset contains 12750 paired images of different resolutions and degrees of low-light illumination, to facilitate learning of deep-learning based models that can perform a direct mapping from degraded images with low visibility to high-quality detail rich images of high resolution