75 research outputs found
The Perception-Distortion Tradeoff
Image restoration algorithms are typically evaluated by some distortion
measure (e.g. PSNR, SSIM, IFC, VIF) or by human opinion scores that quantify
perceived perceptual quality. In this paper, we prove mathematically that
distortion and perceptual quality are at odds with each other. Specifically, we
study the optimal probability for correctly discriminating the outputs of an
image restoration algorithm from real images. We show that as the mean
distortion decreases, this probability must increase (indicating worse
perceptual quality). As opposed to the common belief, this result holds true
for any distortion measure, and is not only a problem of the PSNR or SSIM
criteria. We also show that generative-adversarial-nets (GANs) provide a
principled way to approach the perception-distortion bound. This constitutes
theoretical support to their observed success in low-level vision tasks. Based
on our analysis, we propose a new methodology for evaluating image restoration
methods, and use it to perform an extensive comparison between recent
super-resolution algorithms.Comment: CVPR 2018 (long oral presentation), see talk at:
https://youtu.be/_aXbGqdEkjk?t=39m43
Alias-Free Convnets: Fractional Shift Invariance via Polynomial Activations
Although CNNs are believed to be invariant to translations, recent works have
shown this is not the case, due to aliasing effects that stem from downsampling
layers. The existing architectural solutions to prevent aliasing are partial
since they do not solve these effects, that originate in non-linearities. We
propose an extended anti-aliasing method that tackles both downsampling and
non-linear layers, thus creating truly alias-free, shift-invariant CNNs. We
show that the presented model is invariant to integer as well as fractional
(i.e., sub-pixel) translations, thus outperforming other shift-invariant
methods in terms of robustness to adversarial translations.Comment: The paper was accepted to CVPR 2023. Our code is available at
https://github.com/hmichaeli/alias_free_convnets
xUnit: Learning a Spatial Activation Function for Efficient Image Restoration
In recent years, deep neural networks (DNNs) achieved unprecedented
performance in many low-level vision tasks. However, state-of-the-art results
are typically achieved by very deep networks, which can reach tens of layers
with tens of millions of parameters. To make DNNs implementable on platforms
with limited resources, it is necessary to weaken the tradeoff between
performance and efficiency. In this paper, we propose a new activation unit,
which is particularly suitable for image restoration problems. In contrast to
the widespread per-pixel activation units, like ReLUs and sigmoids, our unit
implements a learnable nonlinear function with spatial connections. This
enables the net to capture much more complex features, thus requiring a
significantly smaller number of layers in order to reach the same performance.
We illustrate the effectiveness of our units through experiments with
state-of-the-art nets for denoising, de-raining, and super resolution, which
are already considered to be very small. With our approach, we are able to
further reduce these models by nearly 50% without incurring any degradation in
performance.Comment: Conference on Computer Vision and Pattern Recognition (CVPR), 201
Semi-Supervised Single- and Multi-Domain Regression with Multi-Domain Training
We address the problems of multi-domain and single-domain regression based on
distinct and unpaired labeled training sets for each of the domains and a large
unlabeled training set from all domains. We formulate these problems as a
Bayesian estimation with partial knowledge of statistical relations. We propose
a worst-case design strategy and study the resulting estimators. Our analysis
explicitly accounts for the cardinality of the labeled sets and includes the
special cases in which one of the labeled sets is very large or, in the other
extreme, completely missing. We demonstrate our estimators in the context of
removing expressions from facial images and in the context of audio-visual word
recognition, and provide comparisons to several recently proposed multi-modal
learning algorithms.Comment: 24 pages, 6 figures, 2 table
- …