23 research outputs found
Learning Enriched Features for Real Image Restoration and Enhancement
With the goal of recovering high-quality image content from its degraded
version, image restoration enjoys numerous applications, such as in
surveillance, computational photography, medical imaging, and remote sensing.
Recently, convolutional neural networks (CNNs) have achieved dramatic
improvements over conventional approaches for image restoration task. Existing
CNN-based methods typically operate either on full-resolution or on
progressively low-resolution representations. In the former case, spatially
precise but contextually less robust results are achieved, while in the latter
case, semantically reliable but spatially less accurate outputs are generated.
In this paper, we present a novel architecture with the collective goals of
maintaining spatially-precise high-resolution representations through the
entire network and receiving strong contextual information from the
low-resolution representations. The core of our approach is a multi-scale
residual block containing several key elements: (a) parallel multi-resolution
convolution streams for extracting multi-scale features, (b) information
exchange across the multi-resolution streams, (c) spatial and channel attention
mechanisms for capturing contextual information, and (d) attention based
multi-scale feature aggregation. In a nutshell, our approach learns an enriched
set of features that combines contextual information from multiple scales,
while simultaneously preserving the high-resolution spatial details. Extensive
experiments on five real image benchmark datasets demonstrate that our method,
named as MIRNet, achieves state-of-the-art results for a variety of image
processing tasks, including image denoising, super-resolution, and image
enhancement. The source code and pre-trained models are available at
https://github.com/swz30/MIRNet.Comment: Accepted for publication at ECCV 202
Noise-to-Norm Reconstruction for Industrial Anomaly Detection and Localization
Anomaly detection has a wide range of applications and is especially
important in industrial quality inspection. Currently, many top-performing
anomaly-detection models rely on feature-embedding methods. However, these
methods do not perform well on datasets with large variations in object
locations. Reconstruction-based methods use reconstruction errors to detect
anomalies without considering positional differences between samples. In this
study, a reconstruction-based method using the noise-to-norm paradigm is
proposed, which avoids the invariant reconstruction of anomalous regions. Our
reconstruction network is based on M-net and incorporates multiscale fusion and
residual attention modules to enable end-to-end anomaly detection and
localization. Experiments demonstrate that the method is effective in
reconstructing anomalous regions into normal patterns and achieving accurate
anomaly detection and localization. On the MPDD and VisA datasets, our proposed
method achieved more competitive results than the latest methods, and it set a
new state-of-the-art standard on the MPDD dataset
LLCaps: Learning to Illuminate Low-Light Capsule Endoscopy with Curved Wavelet Attention and Reverse Diffusion
Wireless capsule endoscopy (WCE) is a painless and non-invasive diagnostic
tool for gastrointestinal (GI) diseases. However, due to GI anatomical
constraints and hardware manufacturing limitations, WCE vision signals may
suffer from insufficient illumination, leading to a complicated screening and
examination procedure. Deep learning-based low-light image enhancement (LLIE)
in the medical field gradually attracts researchers. Given the exuberant
development of the denoising diffusion probabilistic model (DDPM) in computer
vision, we introduce a WCE LLIE framework based on the multi-scale
convolutional neural network (CNN) and reverse diffusion process. The
multi-scale design allows models to preserve high-resolution representation and
context information from low-resolution, while the curved wavelet attention
(CWA) block is proposed for high-frequency and local feature learning.
Furthermore, we combine the reverse diffusion procedure to further optimize the
shallow output and generate the most realistic image. The proposed method is
compared with ten state-of-the-art (SOTA) LLIE methods and significantly
outperforms quantitatively and qualitatively. The superior performance on GI
disease segmentation further demonstrates the clinical potential of our
proposed model. Our code is publicly accessible.Comment: To appear in MICCAI 2023. Code availability:
https://github.com/longbai1006/LLCap
Automatic Signboard Recognition in Low Quality Night Images
An essential requirement for driver assistance systems and autonomous driving
technology is implementing a robust system for detecting and recognizing
traffic signs. This system enables the vehicle to autonomously analyze the
environment and make appropriate decisions regarding its movement, even when
operating at higher frame rates. However, traffic sign images captured in
inadequate lighting and adverse weather conditions are poorly visible, blurred,
faded, and damaged. Consequently, the recognition of traffic signs in such
circumstances becomes inherently difficult. This paper addressed the challenges
of recognizing traffic signs from images captured in low light, noise, and
blurriness. To achieve this goal, a two-step methodology has been employed. The
first step involves enhancing traffic sign images by applying a modified MIRNet
model and producing enhanced images. In the second step, the Yolov4 model
recognizes the traffic signs in an unconstrained environment. The proposed
method has achieved 5.40% increment in [email protected] for low quality images on
Yolov4. The overall [email protected] of 96.75% has been achieved on the GTSRB dataset.
It has also attained [email protected] of 100% on the GTSDB dataset for the broad
categories, comparable with the state-of-the-art work.Comment: 13 pages, CVIP 202
A Unified Conditional Framework for Diffusion-based Image Restoration
Diffusion Probabilistic Models (DPMs) have recently shown remarkable
performance in image generation tasks, which are capable of generating highly
realistic images. When adopting DPMs for image restoration tasks, the crucial
aspect lies in how to integrate the conditional information to guide the DPMs
to generate accurate and natural output, which has been largely overlooked in
existing works. In this paper, we present a unified conditional framework based
on diffusion models for image restoration. We leverage a lightweight UNet to
predict initial guidance and the diffusion model to learn the residual of the
guidance. By carefully designing the basic module and integration module for
the diffusion model block, we integrate the guidance and other auxiliary
conditional information into every block of the diffusion model to achieve
spatially-adaptive generation conditioning. To handle high-resolution images,
we propose a simple yet effective inter-step patch-splitting strategy to
produce arbitrary-resolution images without grid artifacts. We evaluate our
conditional framework on three challenging tasks: extreme low-light denoising,
deblurring, and JPEG restoration, demonstrating its significant improvements in
perceptual quality and the generalization to restoration tasks
Unsupervised Night Image Enhancement: When Layer Decomposition Meets Light-Effects Suppression
Night images suffer not only from low light, but also from uneven
distributions of light. Most existing night visibility enhancement methods
focus mainly on enhancing low-light regions. This inevitably leads to over
enhancement and saturation in bright regions, such as those regions affected by
light effects (glare, floodlight, etc). To address this problem, we need to
suppress the light effects in bright regions while, at the same time, boosting
the intensity of dark regions. With this idea in mind, we introduce an
unsupervised method that integrates a layer decomposition network and a
light-effects suppression network. Given a single night image as input, our
decomposition network learns to decompose shading, reflectance and
light-effects layers, guided by unsupervised layer-specific prior losses. Our
light-effects suppression network further suppresses the light effects and, at
the same time, enhances the illumination in dark regions. This light-effects
suppression network exploits the estimated light-effects layer as the guidance
to focus on the light-effects regions. To recover the background details and
reduce hallucination/artefacts, we propose structure and high-frequency
consistency losses. Our quantitative and qualitative evaluations on real images
show that our method outperforms state-of-the-art methods in suppressing night
light effects and boosting the intensity of dark regions.Comment: Accepted to ECCV202
Cross Aggregation Transformer for Image Restoration
Recently, Transformer architecture has been introduced into image restoration
to replace convolution neural network (CNN) with surprising results.
Considering the high computational complexity of Transformer with global
attention, some methods use the local square window to limit the scope of
self-attention. However, these methods lack direct interaction among different
windows, which limits the establishment of long-range dependencies. To address
the above issue, we propose a new image restoration model, Cross Aggregation
Transformer (CAT). The core of our CAT is the Rectangle-Window Self-Attention
(Rwin-SA), which utilizes horizontal and vertical rectangle window attention in
different heads parallelly to expand the attention area and aggregate the
features cross different windows. We also introduce the Axial-Shift operation
for different window interactions. Furthermore, we propose the Locality
Complementary Module to complement the self-attention mechanism, which
incorporates the inductive bias of CNN (e.g., translation invariance and
locality) into Transformer, enabling global-local coupling. Extensive
experiments demonstrate that our CAT outperforms recent state-of-the-art
methods on several image restoration applications. The code and models are
available at https://github.com/zhengchen1999/CAT.Comment: Accepted to NeurIPS 2022. Code is available at
https://github.com/zhengchen1999/CA