3,641 research outputs found

    Optimization of the Block-level Bit Allocation in Perceptual Video Coding based on MINMAX

    Full text link
    In video coding, it is expected that the encoder could adaptively select the encoding parameters (e.g., quantization parameter) to optimize the bit allocation to different sources under the given constraint. However, in hybrid video coding, the dependency between sources brings high complexity for the bit allocation optimization, especially in the block-level, and existing optimization methods mostly focus on frame-level bit allocation. In this paper, we propose a macroblock (MB) level bit allocation method based on the minimum maximum (MINMAX) criterion, which has acceptable encoding complexity for offline applications. An iterative-based algorithm, namely maximum distortion descend (MDD), is developed to reduce quality fluctuation among MBs within a frame, where the Structure SIMilarity (SSIM) index is used to measure the perceptual distortion of MBs. Our extensive experimental results on benchmark video sequences show that the proposed method can greatly enhance the encoding performance in terms of both bits saving and perceptual quality improvement.Comment: 11 pages, 17 figure

    Image Quality Assessment: Unifying Structure and Texture Similarity

    Full text link
    Objective measures of image quality generally operate by comparing pixels of a "degraded" image to those of the original. Relative to human observers, these measures are overly sensitive to resampling of texture regions (e.g., replacing one patch of grass with another). Here, we develop the first full-reference image quality model with explicit tolerance to texture resampling. Using a convolutional neural network, we construct an injective and differentiable function that transforms images to multi-scale overcomplete representations. We demonstrate empirically that the spatial averages of the feature maps in this representation capture texture appearance, in that they provide a set of sufficient statistical constraints to synthesize a wide variety of texture patterns. We then describe an image quality method that combines correlations of these spatial averages ("texture similarity") with correlations of the feature maps ("structure similarity"). The parameters of the proposed measure are jointly optimized to match human ratings of image quality, while minimizing the reported distances between subimages cropped from the same texture images. Experiments show that the optimized method explains human perceptual scores, both on conventional image quality databases, as well as on texture databases. The measure also offers competitive performance on related tasks such as texture classification and retrieval. Finally, we show that our method is relatively insensitive to geometric transformations (e.g., translation and dilation), without use of any specialized training or data augmentation. Code is available at https://github.com/dingkeyan93/DISTS

    Stochastic Bottleneck: Rateless Auto-Encoder for Flexible Dimensionality Reduction

    Full text link
    We propose a new concept of rateless auto-encoders (RL-AEs) that enable a flexible latent dimensionality, which can be seamlessly adjusted for varying distortion and dimensionality requirements. In the proposed RL-AEs, instead of a deterministic bottleneck architecture, we use an over-complete representation that is stochastically regularized with weighted dropouts, in a manner analogous to sparse AE (SAE). Unlike SAEs, our RL-AEs employ monotonically increasing dropout rates across the latent representation nodes such that the latent variables become sorted by importance like in principal component analysis (PCA). This is motivated by the rateless property of conventional PCA, where the least important principal components can be discarded to realize variable rate dimensionality reduction that gracefully degrades the distortion. In contrast, since the latent variables of conventional AEs are equally important for data reconstruction, they cannot be simply discarded to further reduce the dimensionality after the AE model is trained. Our proposed stochastic bottleneck framework enables seamless rate adaptation with high reconstruction performance, without requiring predetermined latent dimensionality at training. We experimentally demonstrate that the proposed RL-AEs can achieve variable dimensionality reduction while achieving low distortion compared to conventional AEs.Comment: 14 pages, 12 figures, ISIT 2020 accepte

    Multi-measures fusion based on multi-objective genetic programming for full-reference image quality assessment

    Full text link
    In this paper, we exploit the flexibility of multi-objective fitness functions, and the efficiency of the model structure selection ability of a standard genetic programming (GP) with the parameter estimation power of classical regression via multi-gene genetic programming (MGGP), to propose a new fusion technique for image quality assessment (IQA) that is called Multi-measures Fusion based on Multi-Objective Genetic Programming (MFMOGP). This technique can automatically select the most significant suitable measures, from 16 full-reference IQA measures, used in aggregation and finds weights in a weighted sum of their outputs while simultaneously optimizing for both accuracy and complexity. The obtained well-performing fusion of IQA measures are evaluated on four largest publicly available image databases and compared against state-of-the-art full-reference IQA approaches. Results of comparison reveal that the proposed approach outperforms other state-of-the-art recently developed fusion approaches

    Perceptual Quality Assessment of Omnidirectional Images as Moving Camera Videos

    Full text link
    Omnidirectional images (also referred to as static 360{\deg} panoramas) impose viewing conditions much different from those of regular 2D images. How do humans perceive image distortions in immersive virtual reality (VR) environments is an important problem which receives less attention. We argue that, apart from the distorted panorama itself, two types of VR viewing conditions are crucial in determining the viewing behaviors of users and the perceived quality of the panorama: the starting point and the exploration time. We first carry out a psychophysical experiment to investigate the interplay among the VR viewing conditions, the user viewing behaviors, and the perceived quality of 360{\deg} images. Then, we provide a thorough analysis of the collected human data, leading to several interesting findings. Moreover, we propose a computational framework for objective quality assessment of 360{\deg} images, embodying viewing conditions and behaviors in a delightful way. Specifically, we first transform an omnidirectional image to several video representations using different user viewing behaviors under different viewing conditions. We then leverage advanced 2D full-reference video quality models to compute the perceived quality. We construct a set of specific quality measures within the proposed framework, and demonstrate their promises on three VR quality databases.Comment: 11 pages, 11 figure, 9 tables. This paper has been accepted by IEEE Transactions on Visualization and Computer Graphic

    Multi-Scale Recursive and Perception-Distortion Controllable Image Super-Resolution

    Full text link
    We describe our solution for the PIRM Super-Resolution Challenge 2018 where we achieved the 2nd best perceptual quality for average RMSE<=16, 5th best for RMSE<=12.5, and 7th best for RMSE<=11.5. We modify a recently proposed Multi-Grid Back-Projection (MGBP) architecture to work as a generative system with an input parameter that can control the amount of artificial details in the output. We propose a discriminator for adversarial training with the following novel properties: it is multi-scale that resembles a progressive-GAN; it is recursive that balances the architecture of the generator; and it includes a new layer to capture significant statistics of natural images. Finally, we propose a training strategy that avoids conflicts between reconstruction and perceptual losses. Our configuration uses only 281k parameters and upscales each image of the competition in 0.2s in average.Comment: In ECCV 2018 Workshops. Won 2nd place in Region 3 of PIRM-SR Challenge 2018. Code and models are available at https://github.com/pnavarre/pirm-sr-201

    A ParaBoost Stereoscopic Image Quality Assessment (PBSIQA) System

    Full text link
    The problem of stereoscopic image quality assessment, which finds applications in 3D visual content delivery such as 3DTV, is investigated in this work. Specifically, we propose a new ParaBoost (parallel-boosting) stereoscopic image quality assessment (PBSIQA) system. The system consists of two stages. In the first stage, various distortions are classified into a few types, and individual quality scorers targeting at a specific distortion type are developed. These scorers offer complementary performance in face of a database consisting of heterogeneous distortion types. In the second stage, scores from multiple quality scorers are fused to achieve the best overall performance, where the fuser is designed based on the parallel boosting idea borrowed from machine learning. Extensive experimental results are conducted to compare the performance of the proposed PBSIQA system with those of existing stereo image quality assessment (SIQA) metrics. The developed quality metric can serve as an objective function to optimize the performance of a 3D content delivery system

    Blind Predicting Similar Quality Map for Image Quality Assessment

    Full text link
    A key problem in blind image quality assessment (BIQA) is how to effectively model the properties of human visual system in a data-driven manner. In this paper, we propose a simple and efficient BIQA model based on a novel framework which consists of a fully convolutional neural network (FCNN) and a pooling network to solve this problem. In principle, FCNN is capable of predicting a pixel-by-pixel similar quality map only from a distorted image by using the intermediate similarity maps derived from conventional full-reference image quality assessment methods. The predicted pixel-by-pixel quality maps have good consistency with the distortion correlations between the reference and distorted images. Finally, a deep pooling network regresses the quality map into a score. Experiments have demonstrated that our predictions outperform many state-of-the-art BIQA methods

    Deep Optimization model for Screen Content Image Quality Assessment using Neural Networks

    Full text link
    In this paper, we propose a novel quadratic optimized model based on the deep convolutional neural network (QODCNN) for full-reference and no-reference screen content image (SCI) quality assessment. Unlike traditional CNN methods taking all image patches as training data and using average quality pooling, our model is optimized to obtain a more effective model including three steps. In the first step, an end-to-end deep CNN is trained to preliminarily predict the image visual quality, and batch normalized (BN) layers and l2 regularization are employed to improve the speed and performance of network fitting. For second step, the pretrained model is fine-tuned to achieve better performance under analysis of the raw training data. An adaptive weighting method is proposed in the third step to fuse local quality inspired by the perceptual property of the human visual system (HVS) that the HVS is sensitive to image patches containing texture and edge information. The novelty of our algorithm can be concluded as follows: 1) with the consideration of correlation between local quality and subjective differential mean opinion score (DMOS), the Euclidean distance is utilized to measure effectiveness of image patches, and the pretrained model is fine-tuned with more effective training data; 2) an adaptive pooling approach is employed to fuse patch quality of textual and pictorial regions, whose feature only extracted from distorted images owns strong noise robust and effects on both FR and NR IQA; 3) Considering the characteristics of SCIs, a deep and valid network architecture is designed for both NR and FR visual quality evaluation of SCIs. Experimental results verify that our model outperforms both current no-reference and full-reference image quality assessment methods on the benchmark screen content image quality assessment database (SIQAD).Comment: 12pages, 9 figure

    Generative adversarial network-based image super-resolution using perceptual content losses

    Full text link
    In this paper, we propose a deep generative adversarial network for super-resolution considering the trade-off between perception and distortion. Based on good performance of a recently developed model for super-resolution, i.e., deep residual network using enhanced upscale modules (EUSR), the proposed model is trained to improve perceptual performance with only slight increase of distortion. For this purpose, together with the conventional content loss, i.e., reconstruction loss such as L1 or L2, we consider additional losses in the training phase, which are the discrete cosine transform coefficients loss and differential content loss. These consider perceptual part in the content loss, i.e., consideration of proper high frequency components is helpful for the trade-off problem in super-resolution. The experimental results show that our proposed model has good performance for both perception and distortion, and is effective in perceptual super-resolution applications.Comment: To appear in ECCV 2018 workshop. Won the 2nd place for Region 1 in the PIRM Challenge on Perceptual Super Resolution at ECCV 2018. Github at https://github.com/manricheon/eusr-pcl-t
    • …
    corecore