33 research outputs found
DEEP LEARNING-BASED APPROACHES FOR IMAGE RESTORATION
Image restoration is the operation of taking a corrupted or degraded low-quality image and estimating a high-quality clean image that is free of degradations. The most common degradations that affect the quality of the image are blur, atmospheric turbulence, adverse weather conditions (like rain, haze, and snow), and noise. Images captured under the influence of these corruptions or degradations can significantly affect the performance of subsequent computer vision algorithms such as segmentation, recognition, object detection, and tracking. With such algorithms becoming vital components in several applications such as autonomous navigation and video surveillance, it is increasingly important to develop sophisticated algorithms to remove these degradations and high-quality clean images. These reasons have motivated a plethora of research on single image restoration methods to remove such effects.
Recently, following the success of deep learning-based convolutional neural networks, many approaches have been proposed to remove the degradations from the corrupted image. We study the following single image restoration problems: (i) atmospheric turbulence removal, (ii) deblurring, (iii) removing distortions introduced by adverse weather conditions such as rain, haze, and snow, and (iv) removing noise. However, existing single image restoration techniques suffer from the following major limitations: (i) They construct global priors without taking into account that these degradations can have a different effect on different local regions of the image. (ii) They use synthetic datasets for training which often results in sub-optimal performance on the real-world images, typically because of the distributional-shift between synthetic and real-world degraded images. (iii) Existing semi-supervised approaches don't account for the effect of unlabeled or real-world degraded image on semi-supervised performance.
To address these limitations, we propose supervised image restoration techniques where we use uncertainty to improve the restoration performance. To overcome the second limitation, we propose a Gaussian process-based pseudo-labeling approach to leverage the real-world rain information and train the deraininng network in a semi-supervised fashion. Furthermore, to address the third limitation we theoretically study the effect of unlabeled images on semi-supervised performance and propose an adaptive rejection technique to boost semi-supervised performance.
Finally, we recognize that existing supervised and semi-supervised methods need some kind of paired labeled data to train the network, and training on any kind of synthetic paired clean-degraded images may not completely solve the domain gap between synthetic and real-world degraded image distributions.
Thus we propose a self-supervised transformer-based approach for image denoising. Here, given a noisy image, we generate multiple down-sampled images and learn the joint relation between these down-sampled using the Gaussian process to denoise the image
NBD-GAP: Non-Blind Image Deblurring Without Clean Target Images
In recent years, deep neural network-based restoration methods have achieved
state-of-the-art results in various image deblurring tasks. However, one major
drawback of deep learning-based deblurring networks is that large amounts of
blurry-clean image pairs are required for training to achieve good performance.
Moreover, deep networks often fail to perform well when the blurry images and
the blur kernels during testing are very different from the ones used during
training. This happens mainly because of the overfitting of the network
parameters on the training data. In this work, we present a method that
addresses these issues. We view the non-blind image deblurring problem as a
denoising problem. To do so, we perform Wiener filtering on a pair of blurry
images with the corresponding blur kernels. This results in a pair of images
with colored noise. Hence, the deblurring problem is translated into a
denoising problem. We then solve the denoising problem without using explicit
clean target images. Extensive experiments are conducted to show that our
method achieves results that are on par to the state-of-the-art non-blind
deblurring works.Comment: Accepted at ICIP 202
TransWeather: Transformer-based Restoration of Images Degraded by Adverse Weather Conditions
Removing adverse weather conditions like rain, fog, and snow from images is
an important problem in many applications. Most methods proposed in the
literature have been designed to deal with just removing one type of
degradation. Recently, a CNN-based method using neural architecture search
(All-in-One) was proposed to remove all the weather conditions at once.
However, it has a large number of parameters as it uses multiple encoders to
cater to each weather removal task and still has scope for improvement in its
performance. In this work, we focus on developing an efficient solution for the
all adverse weather removal problem. To this end, we propose TransWeather, a
transformer-based end-to-end model with just a single encoder and a decoder
that can restore an image degraded by any weather condition. Specifically, we
utilize a novel transformer encoder using intra-patch transformer blocks to
enhance attention inside the patches to effectively remove smaller weather
degradations. We also introduce a transformer decoder with learnable weather
type embeddings to adjust to the weather degradation at hand. TransWeather
achieves improvements across multiple test datasets over both All-in-One
network as well as methods fine-tuned for specific tasks. TransWeather is also
validated on real world test images and found to be more effective than
previous methods. Implementation code can be accessed at
https://github.com/jeya-maria-jose/TransWeather .Comment: CVPR 202
MAMo: Leveraging Memory and Attention for Monocular Video Depth Estimation
We propose MAMo, a novel memory and attention frame-work for monocular video
depth estimation. MAMo can augment and improve any single-image depth
estimation networks into video depth estimation models, enabling them to take
advantage of the temporal information to predict more accurate depth. In MAMo,
we augment model with memory which aids the depth prediction as the model
streams through the video. Specifically, the memory stores learned visual and
displacement tokens of the previous time instances. This allows the depth
network to cross-reference relevant features from the past when predicting
depth on the current frame. We introduce a novel scheme to continuously update
the memory, optimizing it to keep tokens that correspond with both the past and
the present visual information. We adopt attention-based approach to process
memory features where we first learn the spatio-temporal relation among the
resultant visual and displacement memory tokens using self-attention module.
Further, the output features of self-attention are aggregated with the current
visual features through cross-attention. The cross-attended features are
finally given to a decoder to predict depth on the current frame. Through
extensive experiments on several benchmarks, including KITTI, NYU-Depth V2, and
DDAD, we show that MAMo consistently improves monocular depth estimation
networks and sets new state-of-the-art (SOTA) accuracy. Notably, our MAMo video
depth estimation provides higher accuracy with lower latency, when omparing to
SOTA cost-volume-based video depth models.Comment: Accepted at ICCV 202