3,641 research outputs found
Optimization of the Block-level Bit Allocation in Perceptual Video Coding based on MINMAX
In video coding, it is expected that the encoder could adaptively select the
encoding parameters (e.g., quantization parameter) to optimize the bit
allocation to different sources under the given constraint. However, in hybrid
video coding, the dependency between sources brings high complexity for the bit
allocation optimization, especially in the block-level, and existing
optimization methods mostly focus on frame-level bit allocation. In this paper,
we propose a macroblock (MB) level bit allocation method based on the minimum
maximum (MINMAX) criterion, which has acceptable encoding complexity for
offline applications. An iterative-based algorithm, namely maximum distortion
descend (MDD), is developed to reduce quality fluctuation among MBs within a
frame, where the Structure SIMilarity (SSIM) index is used to measure the
perceptual distortion of MBs. Our extensive experimental results on benchmark
video sequences show that the proposed method can greatly enhance the encoding
performance in terms of both bits saving and perceptual quality improvement.Comment: 11 pages, 17 figure
Image Quality Assessment: Unifying Structure and Texture Similarity
Objective measures of image quality generally operate by comparing pixels of
a "degraded" image to those of the original. Relative to human observers, these
measures are overly sensitive to resampling of texture regions (e.g., replacing
one patch of grass with another). Here, we develop the first full-reference
image quality model with explicit tolerance to texture resampling. Using a
convolutional neural network, we construct an injective and differentiable
function that transforms images to multi-scale overcomplete representations. We
demonstrate empirically that the spatial averages of the feature maps in this
representation capture texture appearance, in that they provide a set of
sufficient statistical constraints to synthesize a wide variety of texture
patterns. We then describe an image quality method that combines correlations
of these spatial averages ("texture similarity") with correlations of the
feature maps ("structure similarity"). The parameters of the proposed measure
are jointly optimized to match human ratings of image quality, while minimizing
the reported distances between subimages cropped from the same texture images.
Experiments show that the optimized method explains human perceptual scores,
both on conventional image quality databases, as well as on texture databases.
The measure also offers competitive performance on related tasks such as
texture classification and retrieval. Finally, we show that our method is
relatively insensitive to geometric transformations (e.g., translation and
dilation), without use of any specialized training or data augmentation. Code
is available at https://github.com/dingkeyan93/DISTS
Stochastic Bottleneck: Rateless Auto-Encoder for Flexible Dimensionality Reduction
We propose a new concept of rateless auto-encoders (RL-AEs) that enable a
flexible latent dimensionality, which can be seamlessly adjusted for varying
distortion and dimensionality requirements. In the proposed RL-AEs, instead of
a deterministic bottleneck architecture, we use an over-complete representation
that is stochastically regularized with weighted dropouts, in a manner
analogous to sparse AE (SAE). Unlike SAEs, our RL-AEs employ monotonically
increasing dropout rates across the latent representation nodes such that the
latent variables become sorted by importance like in principal component
analysis (PCA). This is motivated by the rateless property of conventional PCA,
where the least important principal components can be discarded to realize
variable rate dimensionality reduction that gracefully degrades the distortion.
In contrast, since the latent variables of conventional AEs are equally
important for data reconstruction, they cannot be simply discarded to further
reduce the dimensionality after the AE model is trained. Our proposed
stochastic bottleneck framework enables seamless rate adaptation with high
reconstruction performance, without requiring predetermined latent
dimensionality at training. We experimentally demonstrate that the proposed
RL-AEs can achieve variable dimensionality reduction while achieving low
distortion compared to conventional AEs.Comment: 14 pages, 12 figures, ISIT 2020 accepte
Multi-measures fusion based on multi-objective genetic programming for full-reference image quality assessment
In this paper, we exploit the flexibility of multi-objective fitness
functions, and the efficiency of the model structure selection ability of a
standard genetic programming (GP) with the parameter estimation power of
classical regression via multi-gene genetic programming (MGGP), to propose a
new fusion technique for image quality assessment (IQA) that is called
Multi-measures Fusion based on Multi-Objective Genetic Programming (MFMOGP).
This technique can automatically select the most significant suitable measures,
from 16 full-reference IQA measures, used in aggregation and finds weights in a
weighted sum of their outputs while simultaneously optimizing for both accuracy
and complexity. The obtained well-performing fusion of IQA measures are
evaluated on four largest publicly available image databases and compared
against state-of-the-art full-reference IQA approaches. Results of comparison
reveal that the proposed approach outperforms other state-of-the-art recently
developed fusion approaches
Perceptual Quality Assessment of Omnidirectional Images as Moving Camera Videos
Omnidirectional images (also referred to as static 360{\deg} panoramas)
impose viewing conditions much different from those of regular 2D images. How
do humans perceive image distortions in immersive virtual reality (VR)
environments is an important problem which receives less attention. We argue
that, apart from the distorted panorama itself, two types of VR viewing
conditions are crucial in determining the viewing behaviors of users and the
perceived quality of the panorama: the starting point and the exploration time.
We first carry out a psychophysical experiment to investigate the interplay
among the VR viewing conditions, the user viewing behaviors, and the perceived
quality of 360{\deg} images. Then, we provide a thorough analysis of the
collected human data, leading to several interesting findings. Moreover, we
propose a computational framework for objective quality assessment of 360{\deg}
images, embodying viewing conditions and behaviors in a delightful way.
Specifically, we first transform an omnidirectional image to several video
representations using different user viewing behaviors under different viewing
conditions. We then leverage advanced 2D full-reference video quality models to
compute the perceived quality. We construct a set of specific quality measures
within the proposed framework, and demonstrate their promises on three VR
quality databases.Comment: 11 pages, 11 figure, 9 tables. This paper has been accepted by IEEE
Transactions on Visualization and Computer Graphic
Multi-Scale Recursive and Perception-Distortion Controllable Image Super-Resolution
We describe our solution for the PIRM Super-Resolution Challenge 2018 where
we achieved the 2nd best perceptual quality for average RMSE<=16, 5th best for
RMSE<=12.5, and 7th best for RMSE<=11.5. We modify a recently proposed
Multi-Grid Back-Projection (MGBP) architecture to work as a generative system
with an input parameter that can control the amount of artificial details in
the output. We propose a discriminator for adversarial training with the
following novel properties: it is multi-scale that resembles a progressive-GAN;
it is recursive that balances the architecture of the generator; and it
includes a new layer to capture significant statistics of natural images.
Finally, we propose a training strategy that avoids conflicts between
reconstruction and perceptual losses. Our configuration uses only 281k
parameters and upscales each image of the competition in 0.2s in average.Comment: In ECCV 2018 Workshops. Won 2nd place in Region 3 of PIRM-SR
Challenge 2018. Code and models are available at
https://github.com/pnavarre/pirm-sr-201
A ParaBoost Stereoscopic Image Quality Assessment (PBSIQA) System
The problem of stereoscopic image quality assessment, which finds
applications in 3D visual content delivery such as 3DTV, is investigated in
this work. Specifically, we propose a new ParaBoost (parallel-boosting)
stereoscopic image quality assessment (PBSIQA) system. The system consists of
two stages. In the first stage, various distortions are classified into a few
types, and individual quality scorers targeting at a specific distortion type
are developed. These scorers offer complementary performance in face of a
database consisting of heterogeneous distortion types. In the second stage,
scores from multiple quality scorers are fused to achieve the best overall
performance, where the fuser is designed based on the parallel boosting idea
borrowed from machine learning. Extensive experimental results are conducted to
compare the performance of the proposed PBSIQA system with those of existing
stereo image quality assessment (SIQA) metrics. The developed quality metric
can serve as an objective function to optimize the performance of a 3D content
delivery system
Blind Predicting Similar Quality Map for Image Quality Assessment
A key problem in blind image quality assessment (BIQA) is how to effectively
model the properties of human visual system in a data-driven manner. In this
paper, we propose a simple and efficient BIQA model based on a novel framework
which consists of a fully convolutional neural network (FCNN) and a pooling
network to solve this problem. In principle, FCNN is capable of predicting a
pixel-by-pixel similar quality map only from a distorted image by using the
intermediate similarity maps derived from conventional full-reference image
quality assessment methods. The predicted pixel-by-pixel quality maps have good
consistency with the distortion correlations between the reference and
distorted images. Finally, a deep pooling network regresses the quality map
into a score. Experiments have demonstrated that our predictions outperform
many state-of-the-art BIQA methods
Deep Optimization model for Screen Content Image Quality Assessment using Neural Networks
In this paper, we propose a novel quadratic optimized model based on the deep
convolutional neural network (QODCNN) for full-reference and no-reference
screen content image (SCI) quality assessment. Unlike traditional CNN methods
taking all image patches as training data and using average quality pooling,
our model is optimized to obtain a more effective model including three steps.
In the first step, an end-to-end deep CNN is trained to preliminarily predict
the image visual quality, and batch normalized (BN) layers and l2
regularization are employed to improve the speed and performance of network
fitting. For second step, the pretrained model is fine-tuned to achieve better
performance under analysis of the raw training data. An adaptive weighting
method is proposed in the third step to fuse local quality inspired by the
perceptual property of the human visual system (HVS) that the HVS is sensitive
to image patches containing texture and edge information. The novelty of our
algorithm can be concluded as follows: 1) with the consideration of correlation
between local quality and subjective differential mean opinion score (DMOS),
the Euclidean distance is utilized to measure effectiveness of image patches,
and the pretrained model is fine-tuned with more effective training data; 2) an
adaptive pooling approach is employed to fuse patch quality of textual and
pictorial regions, whose feature only extracted from distorted images owns
strong noise robust and effects on both FR and NR IQA; 3) Considering the
characteristics of SCIs, a deep and valid network architecture is designed for
both NR and FR visual quality evaluation of SCIs. Experimental results verify
that our model outperforms both current no-reference and full-reference image
quality assessment methods on the benchmark screen content image quality
assessment database (SIQAD).Comment: 12pages, 9 figure
Generative adversarial network-based image super-resolution using perceptual content losses
In this paper, we propose a deep generative adversarial network for
super-resolution considering the trade-off between perception and distortion.
Based on good performance of a recently developed model for super-resolution,
i.e., deep residual network using enhanced upscale modules (EUSR), the proposed
model is trained to improve perceptual performance with only slight increase of
distortion. For this purpose, together with the conventional content loss,
i.e., reconstruction loss such as L1 or L2, we consider additional losses in
the training phase, which are the discrete cosine transform coefficients loss
and differential content loss. These consider perceptual part in the content
loss, i.e., consideration of proper high frequency components is helpful for
the trade-off problem in super-resolution. The experimental results show that
our proposed model has good performance for both perception and distortion, and
is effective in perceptual super-resolution applications.Comment: To appear in ECCV 2018 workshop. Won the 2nd place for Region 1 in
the PIRM Challenge on Perceptual Super Resolution at ECCV 2018. Github at
https://github.com/manricheon/eusr-pcl-t
- …