15,769 research outputs found
Bandwidth Extension on Raw Audio via Generative Adversarial Networks
Neural network-based methods have recently demonstrated state-of-the-art
results on image synthesis and super-resolution tasks, in particular by using
variants of generative adversarial networks (GANs) with supervised feature
losses. Nevertheless, previous feature loss formulations rely on the
availability of large auxiliary classifier networks, and labeled datasets that
enable such classifiers to be trained. Furthermore, there has been
comparatively little work to explore the applicability of GAN-based methods to
domains other than images and video. In this work we explore a GAN-based method
for audio processing, and develop a convolutional neural network architecture
to perform audio super-resolution. In addition to several new architectural
building blocks for audio processing, a key component of our approach is the
use of an autoencoder-based loss that enables training in the GAN framework,
with feature losses derived from unlabeled data. We explore the impact of our
architectural choices, and demonstrate significant improvements over previous
works in terms of both objective and perceptual quality
Deep Generative Adversarial Compression Artifact Removal
Compression artifacts arise in images whenever a lossy compression algorithm
is applied. These artifacts eliminate details present in the original image, or
add noise and small structures; because of these effects they make images less
pleasant for the human eye, and may also lead to decreased performance of
computer vision algorithms such as object detectors. To eliminate such
artifacts, when decompressing an image, it is required to recover the original
image from a disturbed version. To this end, we present a feed-forward fully
convolutional residual network model trained using a generative adversarial
framework. To provide a baseline, we show that our model can be also trained
optimizing the Structural Similarity (SSIM), which is a better loss with
respect to the simpler Mean Squared Error (MSE). Our GAN is able to produce
images with more photorealistic details than MSE or SSIM based networks.
Moreover we show that our approach can be used as a pre-processing step for
object detection in case images are degraded by compression to a point that
state-of-the art detectors fail. In this task, our GAN method obtains better
performance than MSE or SSIM trained networks.Comment: ICCV 2017 Camera Ready + Acknowledgement
UGC-VQA: Benchmarking Blind Video Quality Assessment for User Generated Content
Recent years have witnessed an explosion of user-generated content (UGC)
videos shared and streamed over the Internet, thanks to the evolution of
affordable and reliable consumer capture devices, and the tremendous popularity
of social media platforms. Accordingly, there is a great need for accurate
video quality assessment (VQA) models for UGC/consumer videos to monitor,
control, and optimize this vast content. Blind quality prediction of
in-the-wild videos is quite challenging, since the quality degradations of UGC
content are unpredictable, complicated, and often commingled. Here we
contribute to advancing the UGC-VQA problem by conducting a comprehensive
evaluation of leading no-reference/blind VQA (BVQA) features and models on a
fixed evaluation architecture, yielding new empirical insights on both
subjective video quality studies and VQA model design. By employing a feature
selection strategy on top of leading VQA model features, we are able to extract
60 of the 763 statistical features used by the leading models to create a new
fusion-based BVQA model, which we dub the \textbf{VID}eo quality
\textbf{EVAL}uator (VIDEVAL), that effectively balances the trade-off between
VQA performance and efficiency. Our experimental results show that VIDEVAL
achieves state-of-the-art performance at considerably lower computational cost
than other leading models. Our study protocol also defines a reliable benchmark
for the UGC-VQA problem, which we believe will facilitate further research on
deep learning-based VQA modeling, as well as perceptually-optimized efficient
UGC video processing, transcoding, and streaming. To promote reproducible
research and public evaluation, an implementation of VIDEVAL has been made
available online: \url{https://github.com/tu184044109/VIDEVAL_release}.Comment: 13 pages, 11 figures, 11 table
Multi-measures fusion based on multi-objective genetic programming for full-reference image quality assessment
In this paper, we exploit the flexibility of multi-objective fitness
functions, and the efficiency of the model structure selection ability of a
standard genetic programming (GP) with the parameter estimation power of
classical regression via multi-gene genetic programming (MGGP), to propose a
new fusion technique for image quality assessment (IQA) that is called
Multi-measures Fusion based on Multi-Objective Genetic Programming (MFMOGP).
This technique can automatically select the most significant suitable measures,
from 16 full-reference IQA measures, used in aggregation and finds weights in a
weighted sum of their outputs while simultaneously optimizing for both accuracy
and complexity. The obtained well-performing fusion of IQA measures are
evaluated on four largest publicly available image databases and compared
against state-of-the-art full-reference IQA approaches. Results of comparison
reveal that the proposed approach outperforms other state-of-the-art recently
developed fusion approaches
Perceptual Quality Assessment of Omnidirectional Images as Moving Camera Videos
Omnidirectional images (also referred to as static 360{\deg} panoramas)
impose viewing conditions much different from those of regular 2D images. How
do humans perceive image distortions in immersive virtual reality (VR)
environments is an important problem which receives less attention. We argue
that, apart from the distorted panorama itself, two types of VR viewing
conditions are crucial in determining the viewing behaviors of users and the
perceived quality of the panorama: the starting point and the exploration time.
We first carry out a psychophysical experiment to investigate the interplay
among the VR viewing conditions, the user viewing behaviors, and the perceived
quality of 360{\deg} images. Then, we provide a thorough analysis of the
collected human data, leading to several interesting findings. Moreover, we
propose a computational framework for objective quality assessment of 360{\deg}
images, embodying viewing conditions and behaviors in a delightful way.
Specifically, we first transform an omnidirectional image to several video
representations using different user viewing behaviors under different viewing
conditions. We then leverage advanced 2D full-reference video quality models to
compute the perceived quality. We construct a set of specific quality measures
within the proposed framework, and demonstrate their promises on three VR
quality databases.Comment: 11 pages, 11 figure, 9 tables. This paper has been accepted by IEEE
Transactions on Visualization and Computer Graphic
Learning to Predict Streaming Video QoE: Distortions, Rebuffering and Memory
Mobile streaming video data accounts for a large and increasing percentage of
wireless network traffic. The available bandwidths of modern wireless networks
are often unstable, leading to difficulties in delivering smooth, high-quality
video. Streaming service providers such as Netflix and YouTube attempt to adapt
their systems to adjust in response to these bandwidth limitations by changing
the video bitrate or, failing that, allowing playback interruptions
(rebuffering). Being able to predict end user' quality of experience (QoE)
resulting from these adjustments could lead to perceptually-driven network
resource allocation strategies that would deliver streaming content of higher
quality to clients, while being cost effective for providers. Existing
objective QoE models only consider the effects on user QoE of video quality
changes or playback interruptions. For streaming applications, adaptive network
strategies may involve a combination of dynamic bitrate allocation along with
playback interruptions when the available bandwidth reaches a very low value.
Towards effectively predicting user QoE, we propose Video Assessment of
TemporaL Artifacts and Stalls (Video ATLAS): a machine learning framework where
we combine a number of QoE-related features, including objective quality
features, rebuffering-aware features and memory-driven features to make QoE
predictions. We evaluated our learning-based QoE prediction model on the
recently designed LIVE-Netflix Video QoE Database which consists of practical
playout patterns, where the videos are afflicted by both quality changes and
rebuffering events, and found that it provides improved performance over
state-of-the-art video quality metrics while generalizing well on different
datasets. The proposed algorithm is made publicly available at
http://live.ece.utexas.edu/research/Quality/VideoATLAS release_v2.rar.Comment: under review in Transactions on Image Processin
A Deep Journey into Super-resolution: A survey
Deep convolutional networks based super-resolution is a fast-growing field
with numerous practical applications. In this exposition, we extensively
compare 30+ state-of-the-art super-resolution Convolutional Neural Networks
(CNNs) over three classical and three recently introduced challenging datasets
to benchmark single image super-resolution. We introduce a taxonomy for
deep-learning based super-resolution networks that groups existing methods into
nine categories including linear, residual, multi-branch, recursive,
progressive, attention-based and adversarial designs. We also provide
comparisons between the models in terms of network complexity, memory
footprint, model input and output, learning details, the type of network losses
and important architectural differences (e.g., depth, skip-connections,
filters). The extensive evaluation performed, shows the consistent and rapid
growth in the accuracy in the past few years along with a corresponding boost
in model complexity and the availability of large-scale datasets. It is also
observed that the pioneering methods identified as the benchmark have been
significantly outperformed by the current contenders. Despite the progress in
recent years, we identify several shortcomings of existing techniques and
provide future research directions towards the solution of these open problems.Comment: Accepted in ACM Computing Survey
Uncertainty-Aware Blind Image Quality Assessment in the Laboratory and Wild
Performance of blind image quality assessment (BIQA) models has been
significantly boosted by end-to-end optimization of feature engineering and
quality regression. Nevertheless, due to the distributional shift between
images simulated in the laboratory and captured in the wild, models trained on
databases with synthetic distortions remain particularly weak at handling
realistic distortions (and vice versa). To confront the
cross-distortion-scenario challenge, we develop a \textit{unified} BIQA model
and an approach of training it for both synthetic and realistic distortions. We
first sample pairs of images from individual IQA databases, and compute a
probability that the first image of each pair is of higher quality. We then
employ the fidelity loss to optimize a deep neural network for BIQA over a
large number of such image pairs. We also explicitly enforce a hinge constraint
to regularize uncertainty estimation during optimization. Extensive experiments
on six IQA databases show the promise of the learned method in blindly
assessing image quality in the laboratory and wild. In addition, we demonstrate
the universality of the proposed training strategy by using it to improve
existing BIQA models.Comment: Accepted to IEEE TIP. The implementations are available at
https://github.com/zwx8981/UNIQU
DNN-Based Source Enhancement to Increase Objective Sound Quality Assessment Score
We propose a training method for deep neural network (DNN)-based source enhancement to increase objective sound quality assessment (OSQA) scores such as the perceptual evaluation of speech quality (PESQ). In many conventional studies, DNNs have been used as a mapping function to estimate time-frequency masks and trained to minimize an analytically tractable objective function such as the mean squared error (MSE). Since OSQA scores have been used widely for soundquality evaluation, constructing DNNs to increase OSQA scores would be better than using the minimum-MSE to create highquality output signals. However, since most OSQA scores are not analytically tractable, i.e., they are black boxes, the gradient of the objective function cannot be calculated by simply applying back-propagation. To calculate the gradient of the OSQA-based objective function, we formulated a DNN optimization scheme on the basis of black-box optimization, which is used for training a computer that plays a game. For a black-box-optimization scheme, we adopt the policy gradient method for calculating the gradient on the basis of a sampling algorithm. To simulate output signals using the sampling algorithm, DNNs are used to estimate the probability-density function of the output signals that maximize OSQA scores. The OSQA scores are calculated from the simulated output signals, and the DNNs are trained to increase the probability of generating the simulated output signals that achieve high OSQA scores. Through several experiments, we found that OSQA scores significantly increased by applying the proposed method, even though the MSE was not minimized
JND-SalCAR: A Novel JND-based Saliency-Channel Attention Residual Network for Image Quality Prediction
In image quality enhancement processing, it is the most important to predict
how humans perceive processed images since human observers are the ultimate
receivers of the images. Thus, objective image quality assessment (IQA) methods
based on human visual sensitivity from psychophysical experiments have been
extensively studied. Thanks to the powerfulness of deep convolutional neural
networks (CNN), many CNN based IQA models have been studied. However, previous
CNN-based IQA models have not fully utilized the characteristics of human
visual systems (HVS) for IQA problems by simply entrusting everything to CNN
where the CNN-based models are often trained as a regressor to predict the
scores of subjective quality assessment obtained from IQA datasets. In this
paper, we propose a novel JND-based saliency-channel attention residual network
for image quality assessment, called JND-SalCAR, where the human psychophysical
characteristics such as visual saliency and just noticeable difference (JND)
are effectively incorporated. We newly propose a SalCAR block so that
perceptually important features can be extracted by using a saliency-based
spatial attention and a channel attention. In addition, the visual saliency map
is further used as a guideline for predicting the patch weight map in order to
afford a stable training of end-to-end optimization for the JND-SalCAR. To our
best knowledge, our work is the first HVS-inspired trainable IQA network that
considers both the visual saliency and JND characteristics of HVS. We evaluate
the proposed JND-SalCAR on large IQA datasets where it outperforms all the
recent state-of-the-art IQA methods
- …