893 research outputs found
Fully Point-wise Convolutional Neural Network for Modeling Statistical Regularities in Natural Images
Modeling statistical regularity plays an essential role in ill-posed image
processing problems. Recently, deep learning based methods have been presented
to implicitly learn statistical representation of pixel distributions in
natural images and leverage it as a constraint to facilitate subsequent tasks,
such as color constancy and image dehazing. However, the existing CNN
architecture is prone to variability and diversity of pixel intensity within
and between local regions, which may result in inaccurate statistical
representation. To address this problem, this paper presents a novel fully
point-wise CNN architecture for modeling statistical regularities in natural
images. Specifically, we propose to randomly shuffle the pixels in the origin
images and leverage the shuffled image as input to make CNN more concerned with
the statistical properties. Moreover, since the pixels in the shuffled image
are independent identically distributed, we can replace all the large
convolution kernels in CNN with point-wise () convolution kernels while
maintaining the representation ability. Experimental results on two
applications: color constancy and image dehazing, demonstrate the superiority
of our proposed network over the existing architectures, i.e., using
1/101/100 network parameters and computational cost while achieving
comparable performance.Comment: 9 pages, 7 figures. To appear in ACM MM 201
A deep learning framework for quality assessment and restoration in video endoscopy
Endoscopy is a routine imaging technique used for both diagnosis and
minimally invasive surgical treatment. Artifacts such as motion blur, bubbles,
specular reflections, floating objects and pixel saturation impede the visual
interpretation and the automated analysis of endoscopy videos. Given the
widespread use of endoscopy in different clinical applications, we contend that
the robust and reliable identification of such artifacts and the automated
restoration of corrupted video frames is a fundamental medical imaging problem.
Existing state-of-the-art methods only deal with the detection and restoration
of selected artifacts. However, typically endoscopy videos contain numerous
artifacts which motivates to establish a comprehensive solution.
We propose a fully automatic framework that can: 1) detect and classify six
different primary artifacts, 2) provide a quality score for each frame and 3)
restore mildly corrupted frames. To detect different artifacts our framework
exploits fast multi-scale, single stage convolutional neural network detector.
We introduce a quality metric to assess frame quality and predict image
restoration success. Generative adversarial networks with carefully chosen
regularization are finally used to restore corrupted frames.
Our detector yields the highest mean average precision (mAP at 5% threshold)
of 49.0 and the lowest computational time of 88 ms allowing for accurate
real-time processing. Our restoration models for blind deblurring, saturation
correction and inpainting demonstrate significant improvements over previous
methods. On a set of 10 test videos we show that our approach preserves an
average of 68.7% which is 25% more frames than that retained from the raw
videos.Comment: 14 page
Lightweight HDR Camera ISP for Robust Perception in Dynamic Illumination Conditions via Fourier Adversarial Networks
The limited dynamic range of commercial compact camera sensors results in an
inaccurate representation of scenes with varying illumination conditions,
adversely affecting image quality and subsequently limiting the performance of
underlying image processing algorithms. Current state-of-the-art (SoTA)
convolutional neural networks (CNN) are developed as post-processing techniques
to independently recover under-/over-exposed images. However, when applied to
images containing real-world degradations such as glare, high-beam, color
bleeding with varying noise intensity, these algorithms amplify the
degradations, further degrading image quality. We propose a lightweight
two-stage image enhancement algorithm sequentially balancing illumination and
noise removal using frequency priors for structural guidance to overcome these
limitations. Furthermore, to ensure realistic image quality, we leverage the
relationship between frequency and spatial domain properties of an image and
propose a Fourier spectrum-based adversarial framework (AFNet) for consistent
image enhancement under varying illumination conditions. While current
formulations of image enhancement are envisioned as post-processing techniques,
we examine if such an algorithm could be extended to integrate the
functionality of the Image Signal Processing (ISP) pipeline within the camera
sensor benefiting from RAW sensor data and lightweight CNN architecture. Based
on quantitative and qualitative evaluations, we also examine the practicality
and effects of image enhancement techniques on the performance of common
perception tasks such as object detection and semantic segmentation in varying
illumination conditions.Comment: Accepted in BMVC 202
Enlighten-anything:When Segment Anything Model Meets Low-light Image Enhancement
Image restoration is a low-level visual task, and most CNN methods are
designed as black boxes, lacking transparency and intrinsic aesthetics. Many
unsupervised approaches ignore the degradation of visible information in
low-light scenes, which will seriously affect the aggregation of complementary
information and also make the fusion algorithm unable to produce satisfactory
fusion results under extreme conditions. In this paper, we propose
Enlighten-anything, which is able to enhance and fuse the semantic intent of
SAM segmentation with low-light images to obtain fused images with good visual
perception. The generalization ability of unsupervised learning is greatly
improved, and experiments on LOL dataset are conducted to show that our method
improves 3db in PSNR over baseline and 8 in SSIM. zero-shot learning of SAM
introduces a powerful aid for unsupervised low-light enhancement. The source
code of Rethink-Diffusion can be obtained from
https://github.com/zhangbaijin/enlighten-anythin
Bilevel Fast Scene Adaptation for Low-Light Image Enhancement
Enhancing images in low-light scenes is a challenging but widely concerned
task in the computer vision. The mainstream learning-based methods mainly
acquire the enhanced model by learning the data distribution from the specific
scenes, causing poor adaptability (even failure) when meeting real-world
scenarios that have never been encountered before. The main obstacle lies in
the modeling conundrum from distribution discrepancy across different scenes.
To remedy this, we first explore relationships between diverse low-light scenes
based on statistical analysis, i.e., the network parameters of the encoder
trained in different data distributions are close. We introduce the bilevel
paradigm to model the above latent correspondence from the perspective of
hyperparameter optimization. A bilevel learning framework is constructed to
endow the scene-irrelevant generality of the encoder towards diverse scenes
(i.e., freezing the encoder in the adaptation and testing phases). Further, we
define a reinforced bilevel learning framework to provide a meta-initialization
for scene-specific decoder to further ameliorate visual quality. Moreover, to
improve the practicability, we establish a Retinex-induced architecture with
adaptive denoising and apply our built learning framework to acquire its
parameters by using two training losses including supervised and unsupervised
forms. Extensive experimental evaluations on multiple datasets verify our
adaptability and competitive performance against existing state-of-the-art
works. The code and datasets will be available at
https://github.com/vis-opt-group/BL
- …