674 research outputs found
The Unreasonable Effectiveness of Deep Features as a Perceptual Metric
While it is nearly effortless for humans to quickly assess the perceptual
similarity between two images, the underlying processes are thought to be quite
complex. Despite this, the most widely used perceptual metrics today, such as
PSNR and SSIM, are simple, shallow functions, and fail to account for many
nuances of human perception. Recently, the deep learning community has found
that features of the VGG network trained on ImageNet classification has been
remarkably useful as a training loss for image synthesis. But how perceptual
are these so-called "perceptual losses"? What elements are critical for their
success? To answer these questions, we introduce a new dataset of human
perceptual similarity judgments. We systematically evaluate deep features
across different architectures and tasks and compare them with classic metrics.
We find that deep features outperform all previous metrics by large margins on
our dataset. More surprisingly, this result is not restricted to
ImageNet-trained VGG features, but holds across different deep architectures
and levels of supervision (supervised, self-supervised, or even unsupervised).
Our results suggest that perceptual similarity is an emergent property shared
across deep visual representations.Comment: Accepted to CVPR 2018; Code and data available at
https://www.github.com/richzhang/PerceptualSimilarit
Motion Offset for Blur Modeling
Motion blur caused by the relative movement between the camera and the subject is often an undesirable degradation of the image quality. In most conventional deblurring methods, a blur kernel is estimated for image deconvolution. Due to the ill-posed nature, predefined priors are proposed to suppress the ill-posedness. However, these predefined priors can only handle some specific situations. In order to achieve a better deblurring performance on dynamic scene, deep-learning based methods are proposed to learn a mapping function that restore the sharp image from a blurry image. The blur may be implicitly modelled in feature extraction module. However, the blur modelled from the paired dataset cannot be well generalized to some real-world scenes. To summary, an accurate and dynamic blur model that more closely approximates real-world blur is needed.
By revisiting the principle of camera exposure, we can model the blur with the displacements between sharp pixels and the exposed pixel, namely motion offsets. Given specific physical constraints, motion offsets are able to form different exposure trajectories (i.e. linear, quadratic). Compare to conventional blur kernel, our proposed motion offsets are a more rigorous approximation for real-world blur, since they can constitute a non-linear and non-uniform motion field. Through learning from dynamic scene dataset, an accurate and spatial-variant motion offset field is obtained.
With accurate motion information and a compact blur modeling method, we explore the ways of utilizing motion information to facilitate multiple blur-related tasks. By introducing recovered motion offsets, we build up a motion-aware and spatial-variant convolution. For extracting a video clip from a blurry image, motion offsets can provide an explicit (non-)linear motion trajectory for interpolating. We also work towards a better image deblurring performance in real-world scenarios by improving the generalization ability of the deblurring model
Take a Prior from Other Tasks for Severe Blur Removal
Recovering clear structures from severely blurry inputs is a challenging
problem due to the large movements between the camera and the scene. Although
some works apply segmentation maps on human face images for deblurring, they
cannot handle natural scenes because objects and degradation are more complex,
and inaccurate segmentation maps lead to a loss of details. For general scene
deblurring, the feature space of the blurry image and corresponding sharp image
under the high-level vision task is closer, which inspires us to rely on other
tasks (e.g. classification) to learn a comprehensive prior in severe blur
removal cases. We propose a cross-level feature learning strategy based on
knowledge distillation to learn the priors, which include global contexts and
sharp local structures for recovering potential details. In addition, we
propose a semantic prior embedding layer with multi-level aggregation and
semantic attention transformation to integrate the priors effectively. We
introduce the proposed priors to various models, including the UNet and other
mainstream deblurring baselines, leading to better performance on severe blur
removal. Extensive experiments on natural image deblurring benchmarks and
real-world images, such as GoPro and RealBlur datasets, demonstrate our
method's effectiveness and generalization ability
- …