1,047 research outputs found
The Unreasonable Effectiveness of Deep Features as a Perceptual Metric
While it is nearly effortless for humans to quickly assess the perceptual
similarity between two images, the underlying processes are thought to be quite
complex. Despite this, the most widely used perceptual metrics today, such as
PSNR and SSIM, are simple, shallow functions, and fail to account for many
nuances of human perception. Recently, the deep learning community has found
that features of the VGG network trained on ImageNet classification has been
remarkably useful as a training loss for image synthesis. But how perceptual
are these so-called "perceptual losses"? What elements are critical for their
success? To answer these questions, we introduce a new dataset of human
perceptual similarity judgments. We systematically evaluate deep features
across different architectures and tasks and compare them with classic metrics.
We find that deep features outperform all previous metrics by large margins on
our dataset. More surprisingly, this result is not restricted to
ImageNet-trained VGG features, but holds across different deep architectures
and levels of supervision (supervised, self-supervised, or even unsupervised).
Our results suggest that perceptual similarity is an emergent property shared
across deep visual representations.Comment: Accepted to CVPR 2018; Code and data available at
https://www.github.com/richzhang/PerceptualSimilarit
AT-DDPM: Restoring Faces degraded by Atmospheric Turbulence using Denoising Diffusion Probabilistic Models
Although many long-range imaging systems are designed to support extended
vision applications, a natural obstacle to their operation is degradation due
to atmospheric turbulence. Atmospheric turbulence causes significant
degradation to image quality by introducing blur and geometric distortion. In
recent years, various deep learning-based single image atmospheric turbulence
mitigation methods, including CNN-based and GAN inversion-based, have been
proposed in the literature which attempt to remove the distortion in the image.
However, some of these methods are difficult to train and often fail to
reconstruct facial features and produce unrealistic results especially in the
case of high turbulence. Denoising Diffusion Probabilistic Models (DDPMs) have
recently gained some traction because of their stable training process and
their ability to generate high quality images. In this paper, we propose the
first DDPM-based solution for the problem of atmospheric turbulence mitigation.
We also propose a fast sampling technique for reducing the inference times for
conditional DDPMs. Extensive experiments are conducted on synthetic and
real-world data to show the significance of our model. To facilitate further
research, all codes and pretrained models are publically available at
http://github.com/Nithin-GK/AT-DDPMComment: Accepted to IEEE WACV 202
Spatiotemporal Video Quality Assessment Method via Multiple Feature Mappings
Progressed video quality assessment (VQA) methods aim to evaluate the perceptual quality of videos in many applications but often prompt to increase computational complexity. Problems derive from the complexity of the distorted videos that are of significant concern in the communication industry, as well as the spatial-temporal content of the two-fold (spatial and temporal) distortion. Therefore, the findings of the study indicate that the information in the spatiotemporal slice (STS) images are useful in measuring video distortion. This paper mainly focuses on developing on a full reference video quality assessment algorithm estimator that integrates several features of spatiotemporal slices (STSS) of frames to form a high-performance video quality. This research work aims to evaluate video quality by utilizing several VQA databases by the following steps: (1) we first arrange the reference and test video sequences into a spatiotemporal slice representation. A collection of spatiotemporal feature maps were computed on each reference-test video. These response features are then processed by using a Structural Similarity (SSIM) to form a local frame quality. (2) To further enhance the quality assessment, we combine the spatial feature maps with the spatiotemporal feature maps and propose the VQA model, named multiple map similarity feature deviation (MMSFD-STS). (3) We apply a sequential pooling strategy to assemble the quality indices of frames in the video quality scoring. (4) Extensive evaluations on video quality databases show that the proposed VQA algorithm achieves better/competitive performance as compared with other state- of- the- art methods
SpatioTemporal Feature Integration and Model Fusion for Full Reference Video Quality Assessment
Perceptual video quality assessment models are either frame-based or
video-based, i.e., they apply spatiotemporal filtering or motion estimation to
capture temporal video distortions. Despite their good performance on video
quality databases, video-based approaches are time-consuming and harder to
efficiently deploy. To balance between high performance and computational
efficiency, Netflix developed the Video Multi-method Assessment Fusion (VMAF)
framework, which integrates multiple quality-aware features to predict video
quality. Nevertheless, this fusion framework does not fully exploit temporal
video quality measurements which are relevant to temporal video distortions. To
this end, we propose two improvements to the VMAF framework: SpatioTemporal
VMAF and Ensemble VMAF. Both algorithms exploit efficient temporal video
features which are fed into a single or multiple regression models. To train
our models, we designed a large subjective database and evaluated the proposed
models against state-of-the-art approaches. The compared algorithms will be
made available as part of the open source package in
https://github.com/Netflix/vmaf
Deep Learning frameworks for Image Quality Assessment
Technology is advancing by the arrival of deep learning and it finds huge application in image
processing also. Deep learning itself sufficient to perform over all the statistical methods. As a
research work, I implemented image quality assessment techniques using deep learning. Here I
proposed two full reference image quality assessment algorithms and two no reference image quality
algorithms. Among the two algorithms on each method, one is in a supervised manner and other is
in an unsupervised manner.
First proposed method is the full reference image quality assessment using autoencoder. Existing
literature shows that statistical features of pristine images will get distorted in presence of distortion.
It will be more advantageous if algorithm itself learns the distortion discriminating features. It will
be more complex if the feature length is more. So autoencoder is trained using a large number of
pristine images. An autoencoder will give the best lower dimensional representation of the input.
It is showed that encoded distance features have good distortion discrimination properties. The
proposed algorithm delivers competitive performance over standard databases.
If we are giving both reference and distorted images to the model and the model learning itself
and gives the scores will reduce the load of extracting features and doing post-processing. But model
should be capable one for discriminating the features by itself. Second method which I proposed is
a full reference and no reference image quality assessment using deep convolutional neural networks.
A network is trained in a supervised manner with subjective scores as targets. The algorithm is
performing e�ciently for the distortions that are learned while training the model.
Last proposed method is a classiffication based no reference image quality assessment. Distortion
level in an image may vary from one region to another region. We may not be able to view distortion
in some part but it may be present in other parts. A classiffication model is able to tell whether a
given input patch is of low quality or high quality. It is shown that aggregate of the patch quality
scores is having a high correlation with the subjective scores
- …