40 research outputs found
PromptIR: Prompting for All-in-One Blind Image Restoration
Image restoration involves recovering a high-quality clean image from its
degraded version. Deep learning-based methods have significantly improved image
restoration performance, however, they have limited generalization ability to
different degradation types and levels. This restricts their real-world
application since it requires training individual models for each specific
degradation and knowing the input degradation type to apply the relevant model.
We present a prompt-based learning approach, PromptIR, for All-In-One image
restoration that can effectively restore images from various types and levels
of degradation. In particular, our method uses prompts to encode
degradation-specific information, which is then used to dynamically guide the
restoration network. This allows our method to generalize to different
degradation types and levels, while still achieving state-of-the-art results on
image denoising, deraining, and dehazing. Overall, PromptIR offers a generic
and efficient plugin module with few lightweight prompts that can be used to
restore images of various types and levels of degradation with no prior
information on the corruptions present in the image. Our code and pretrained
models are available here: https://github.com/va1shn9v/PromptI
Striking the Right Balance with Uncertainty
Learning unbiased models on imbalanced datasets is a significant challenge.
Rare classes tend to get a concentrated representation in the classification
space which hampers the generalization of learned boundaries to new test
examples. In this paper, we demonstrate that the Bayesian uncertainty estimates
directly correlate with the rarity of classes and the difficulty level of
individual samples. Subsequently, we present a novel framework for uncertainty
based class imbalance learning that follows two key insights: First,
classification boundaries should be extended further away from a more uncertain
(rare) class to avoid overfitting and enhance its generalization. Second, each
sample should be modeled as a multi-variate Gaussian distribution with a mean
vector and a covariance matrix defined by the sample's uncertainty. The learned
boundaries should respect not only the individual samples but also their
distribution in the feature space. Our proposed approach efficiently utilizes
sample and class uncertainty information to learn robust features and more
generalizable classifiers. We systematically study the class imbalance problem
and derive a novel loss formulation for max-margin learning based on Bayesian
uncertainty measure. The proposed method shows significant performance
improvements on six benchmark datasets for face verification, attribute
prediction, digit/object classification and skin lesion detection.Comment: CVPR 201
Burstormer: Burst Image Restoration and Enhancement Transformer
On a shutter press, modern handheld cameras capture multiple images in rapid
succession and merge them to generate a single image. However, individual
frames in a burst are misaligned due to inevitable motions and contain multiple
degradations. The challenge is to properly align the successive image shots and
merge their complimentary information to achieve high-quality outputs. Towards
this direction, we propose Burstormer: a novel transformer-based architecture
for burst image restoration and enhancement. In comparison to existing works,
our approach exploits multi-scale local and non-local features to achieve
improved alignment and feature fusion. Our key idea is to enable inter-frame
communication in the burst neighborhoods for information aggregation and
progressive fusion while modeling the burst-wide context. However, the input
burst frames need to be properly aligned before fusing their information.
Therefore, we propose an enhanced deformable alignment module for aligning
burst features with regards to the reference frame. Unlike existing methods,
the proposed alignment module not only aligns burst features but also exchanges
feature information and maintains focused communication with the reference
frame through the proposed reference-based feature enrichment mechanism, which
facilitates handling complex motions. After multi-level alignment and
enrichment, we re-emphasize on inter-frame communication within burst using a
cyclic burst sampling module. Finally, the inter-frame information is
aggregated using the proposed burst feature fusion module followed by
progressive upsampling. Our Burstormer outperforms state-of-the-art methods on
burst super-resolution, burst denoising and burst low-light enhancement. Our
codes and pretrained models are available at https://
github.com/akshaydudhane16/BurstormerComment: Accepted at CVPR 202
Vision models for wide color gamut imaging in cinema
Gamut mapping is the problem of transforming the colors of image or video content so as to fully exploit the color palette of the display device where the content will be shown, while preserving the artistic intent of the original content's creator. In particular, in the cinema industry, the rapid advancement in display technologies has created a pressing need to develop automatic and fast gamut mapping algorithms. In this article, we propose a novel framework that is based on vision science models, performs both gamut reduction and gamut extension, is of low computational complexity, produces results that are free from artifacts and outperforms state-of-the-art methods according to psychophysical tests. Our experiments also highlight the limitations of existing objective metrics for the gamut mapping problem
Gated Multi-Resolution Transfer Network for Burst Restoration and Enhancement
Burst image processing is becoming increasingly popular in recent years.
However, it is a challenging task since individual burst images undergo
multiple degradations and often have mutual misalignments resulting in ghosting
and zipper artifacts. Existing burst restoration methods usually do not
consider the mutual correlation and non-local contextual information among
burst frames, which tends to limit these approaches in challenging cases.
Another key challenge lies in the robust up-sampling of burst frames. The
existing up-sampling methods cannot effectively utilize the advantages of
single-stage and progressive up-sampling strategies with conventional and/or
recent up-samplers at the same time. To address these challenges, we propose a
novel Gated Multi-Resolution Transfer Network (GMTNet) to reconstruct a
spatially precise high-quality image from a burst of low-quality raw images.
GMTNet consists of three modules optimized for burst processing tasks:
Multi-scale Burst Feature Alignment (MBFA) for feature denoising and alignment,
Transposed-Attention Feature Merging (TAFM) for multi-frame feature
aggregation, and Resolution Transfer Feature Up-sampler (RTFU) to up-scale
merged features and construct a high-quality output image. Detailed
experimental analysis on five datasets validates our approach and sets a
state-of-the-art for burst super-resolution, burst denoising, and low-light
burst enhancement.Comment: Accepted at CVPR 202
CycleISP: Real Image Restoration via Improved Data Synthesis
The availability of large-scale datasets has helped unleash the true
potential of deep convolutional neural networks (CNNs). However, for the
single-image denoising problem, capturing a real dataset is an unacceptably
expensive and cumbersome procedure. Consequently, image denoising algorithms
are mostly developed and evaluated on synthetic data that is usually generated
with a widespread assumption of additive white Gaussian noise (AWGN). While the
CNNs achieve impressive results on these synthetic datasets, they do not
perform well when applied on real camera images, as reported in recent
benchmark datasets. This is mainly because the AWGN is not adequate for
modeling the real camera noise which is signal-dependent and heavily
transformed by the camera imaging pipeline. In this paper, we present a
framework that models camera imaging pipeline in forward and reverse
directions. It allows us to produce any number of realistic image pairs for
denoising both in RAW and sRGB spaces. By training a new image denoising
network on realistic synthetic data, we achieve the state-of-the-art
performance on real camera benchmark datasets. The parameters in our model are
~5 times lesser than the previous best method for RAW denoising. Furthermore,
we demonstrate that the proposed framework generalizes beyond image denoising
problem e.g., for color matching in stereoscopic cinema. The source code and
pre-trained models are available at https://github.com/swz30/CycleISP.Comment: CVPR 2020 (Oral