1,047 research outputs found

    The Unreasonable Effectiveness of Deep Features as a Perceptual Metric

    Full text link
    While it is nearly effortless for humans to quickly assess the perceptual similarity between two images, the underlying processes are thought to be quite complex. Despite this, the most widely used perceptual metrics today, such as PSNR and SSIM, are simple, shallow functions, and fail to account for many nuances of human perception. Recently, the deep learning community has found that features of the VGG network trained on ImageNet classification has been remarkably useful as a training loss for image synthesis. But how perceptual are these so-called "perceptual losses"? What elements are critical for their success? To answer these questions, we introduce a new dataset of human perceptual similarity judgments. We systematically evaluate deep features across different architectures and tasks and compare them with classic metrics. We find that deep features outperform all previous metrics by large margins on our dataset. More surprisingly, this result is not restricted to ImageNet-trained VGG features, but holds across different deep architectures and levels of supervision (supervised, self-supervised, or even unsupervised). Our results suggest that perceptual similarity is an emergent property shared across deep visual representations.Comment: Accepted to CVPR 2018; Code and data available at https://www.github.com/richzhang/PerceptualSimilarit

    AT-DDPM: Restoring Faces degraded by Atmospheric Turbulence using Denoising Diffusion Probabilistic Models

    Full text link
    Although many long-range imaging systems are designed to support extended vision applications, a natural obstacle to their operation is degradation due to atmospheric turbulence. Atmospheric turbulence causes significant degradation to image quality by introducing blur and geometric distortion. In recent years, various deep learning-based single image atmospheric turbulence mitigation methods, including CNN-based and GAN inversion-based, have been proposed in the literature which attempt to remove the distortion in the image. However, some of these methods are difficult to train and often fail to reconstruct facial features and produce unrealistic results especially in the case of high turbulence. Denoising Diffusion Probabilistic Models (DDPMs) have recently gained some traction because of their stable training process and their ability to generate high quality images. In this paper, we propose the first DDPM-based solution for the problem of atmospheric turbulence mitigation. We also propose a fast sampling technique for reducing the inference times for conditional DDPMs. Extensive experiments are conducted on synthetic and real-world data to show the significance of our model. To facilitate further research, all codes and pretrained models are publically available at http://github.com/Nithin-GK/AT-DDPMComment: Accepted to IEEE WACV 202

    Spatiotemporal Video Quality Assessment Method via Multiple Feature Mappings

    Get PDF
    Progressed video quality assessment (VQA) methods aim to evaluate the perceptual quality of videos in many applications but often prompt to increase computational complexity. Problems derive from the complexity of the distorted videos that are of significant concern in the communication industry, as well as the spatial-temporal content of the two-fold (spatial and temporal) distortion. Therefore, the findings of the study indicate that the information in the spatiotemporal slice (STS) images are useful in measuring video distortion. This paper mainly focuses on developing on a full reference video quality assessment algorithm estimator that integrates several features of spatiotemporal slices (STSS) of frames to form a high-performance video quality. This research work aims to evaluate video quality by utilizing several VQA databases by the following steps: (1) we first arrange the reference and test video sequences into a spatiotemporal slice representation. A collection of spatiotemporal feature maps were computed on each reference-test video. These response features are then processed by using a Structural Similarity (SSIM) to form a local frame quality.  (2) To further enhance the quality assessment, we combine the spatial feature maps with the spatiotemporal feature maps and propose the VQA model, named multiple map similarity feature deviation (MMSFD-STS). (3) We apply a sequential pooling strategy to assemble the quality indices of frames in the video quality scoring. (4) Extensive evaluations on video quality databases show that the proposed VQA algorithm achieves better/competitive performance as compared with other state- of- the- art methods

    SpatioTemporal Feature Integration and Model Fusion for Full Reference Video Quality Assessment

    Full text link
    Perceptual video quality assessment models are either frame-based or video-based, i.e., they apply spatiotemporal filtering or motion estimation to capture temporal video distortions. Despite their good performance on video quality databases, video-based approaches are time-consuming and harder to efficiently deploy. To balance between high performance and computational efficiency, Netflix developed the Video Multi-method Assessment Fusion (VMAF) framework, which integrates multiple quality-aware features to predict video quality. Nevertheless, this fusion framework does not fully exploit temporal video quality measurements which are relevant to temporal video distortions. To this end, we propose two improvements to the VMAF framework: SpatioTemporal VMAF and Ensemble VMAF. Both algorithms exploit efficient temporal video features which are fed into a single or multiple regression models. To train our models, we designed a large subjective database and evaluated the proposed models against state-of-the-art approaches. The compared algorithms will be made available as part of the open source package in https://github.com/Netflix/vmaf

    Deep Learning frameworks for Image Quality Assessment

    Get PDF
    Technology is advancing by the arrival of deep learning and it finds huge application in image processing also. Deep learning itself sufficient to perform over all the statistical methods. As a research work, I implemented image quality assessment techniques using deep learning. Here I proposed two full reference image quality assessment algorithms and two no reference image quality algorithms. Among the two algorithms on each method, one is in a supervised manner and other is in an unsupervised manner. First proposed method is the full reference image quality assessment using autoencoder. Existing literature shows that statistical features of pristine images will get distorted in presence of distortion. It will be more advantageous if algorithm itself learns the distortion discriminating features. It will be more complex if the feature length is more. So autoencoder is trained using a large number of pristine images. An autoencoder will give the best lower dimensional representation of the input. It is showed that encoded distance features have good distortion discrimination properties. The proposed algorithm delivers competitive performance over standard databases. If we are giving both reference and distorted images to the model and the model learning itself and gives the scores will reduce the load of extracting features and doing post-processing. But model should be capable one for discriminating the features by itself. Second method which I proposed is a full reference and no reference image quality assessment using deep convolutional neural networks. A network is trained in a supervised manner with subjective scores as targets. The algorithm is performing e�ciently for the distortions that are learned while training the model. Last proposed method is a classiffication based no reference image quality assessment. Distortion level in an image may vary from one region to another region. We may not be able to view distortion in some part but it may be present in other parts. A classiffication model is able to tell whether a given input patch is of low quality or high quality. It is shown that aggregate of the patch quality scores is having a high correlation with the subjective scores
    corecore