35,744 research outputs found
No-Reference Quality Assessment of Contrast-Distorted Images using Contrast Enhancement
No-reference image quality assessment (NR-IQA) aims to measure the image
quality without reference image. However, contrast distortion has been
overlooked in the current research of NR-IQA. In this paper, we propose a very
simple but effective metric for predicting quality of contrast-altered images
based on the fact that a high-contrast image is often more similar to its
contrast enhanced image. Specifically, we first generate an enhanced image
through histogram equalization. We then calculate the similarity of the
original image and the enhanced one by using structural-similarity index (SSIM)
as the first feature. Further, we calculate the histogram based entropy and
cross entropy between the original image and the enhanced one respectively, to
gain a sum of 4 features. Finally, we learn a regression module to fuse the
aforementioned 5 features for inferring the quality score. Experiments on four
publicly available databases validate the superiority and efficiency of the
proposed technique.Comment: Draft versio
A Probabilistic Quality Representation Approach to Deep Blind Image Quality Prediction
Blind image quality assessment (BIQA) remains a very challenging problem due
to the unavailability of a reference image. Deep learning based BIQA methods
have been attracting increasing attention in recent years, yet it remains a
difficult task to train a robust deep BIQA model because of the very limited
number of training samples with human subjective scores. Most existing methods
learn a regression network to minimize the prediction error of a scalar image
quality score. However, such a scheme ignores the fact that an image will
receive divergent subjective scores from different subjects, which cannot be
adequately represented by a single scalar number. This is particularly true on
complex, real-world distorted images. Moreover, images may broadly differ in
their distributions of assigned subjective scores. Recognizing this, we propose
a new representation of perceptual image quality, called probabilistic quality
representation (PQR), to describe the image subjective score distribution,
whereby a more robust loss function can be employed to train a deep BIQA model.
The proposed PQR method is shown to not only speed up the convergence of deep
model training, but to also greatly improve the achievable level of quality
prediction accuracy relative to scalar quality score regression methods. The
source code is available at https://github.com/HuiZeng/BIQA_Toolbox.Comment: Add the link of source cod
dipIQ: Blind Image Quality Assessment by Learning-to-Rank Discriminable Image Pairs
Objective assessment of image quality is fundamentally important in many
image processing tasks. In this work, we focus on learning blind image quality
assessment (BIQA) models which predict the quality of a digital image with no
access to its original pristine-quality counterpart as reference. One of the
biggest challenges in learning BIQA models is the conflict between the gigantic
image space (which is in the dimension of the number of image pixels) and the
extremely limited reliable ground truth data for training. Such data are
typically collected via subjective testing, which is cumbersome, slow, and
expensive. Here we first show that a vast amount of reliable training data in
the form of quality-discriminable image pairs (DIP) can be obtained
automatically at low cost by exploiting large-scale databases with diverse
image content. We then learn an opinion-unaware BIQA (OU-BIQA, meaning that no
subjective opinions are used for training) model using RankNet, a pairwise
learning-to-rank (L2R) algorithm, from millions of DIPs, each associated with a
perceptual uncertainty level, leading to a DIP inferred quality (dipIQ) index.
Extensive experiments on four benchmark IQA databases demonstrate that dipIQ
outperforms state-of-the-art OU-BIQA models. The robustness of dipIQ is also
significantly improved as confirmed by the group MAximum Differentiation (gMAD)
competition method. Furthermore, we extend the proposed framework by learning
models with ListNet (a listwise L2R algorithm) on quality-discriminable image
lists (DIL). The resulting DIL Inferred Quality (dilIQ) index achieves an
additional performance gain
Habitat: A Platform for Embodied AI Research
We present Habitat, a platform for research in embodied artificial
intelligence (AI). Habitat enables training embodied agents (virtual robots) in
highly efficient photorealistic 3D simulation. Specifically, Habitat consists
of: (i) Habitat-Sim: a flexible, high-performance 3D simulator with
configurable agents, sensors, and generic 3D dataset handling. Habitat-Sim is
fast -- when rendering a scene from Matterport3D, it achieves several thousand
frames per second (fps) running single-threaded, and can reach over 10,000 fps
multi-process on a single GPU. (ii) Habitat-API: a modular high-level library
for end-to-end development of embodied AI algorithms -- defining tasks (e.g.,
navigation, instruction following, question answering), configuring, training,
and benchmarking embodied agents.
These large-scale engineering contributions enable us to answer scientific
questions requiring experiments that were till now impracticable or 'merely'
impractical. Specifically, in the context of point-goal navigation: (1) we
revisit the comparison between learning and SLAM approaches from two recent
works and find evidence for the opposite conclusion -- that learning
outperforms SLAM if scaled to an order of magnitude more experience than
previous investigations, and (2) we conduct the first cross-dataset
generalization experiments {train, test} x {Matterport3D, Gibson} for multiple
sensors {blind, RGB, RGBD, D} and find that only agents with depth (D) sensors
generalize across datasets. We hope that our open-source platform and these
findings will advance research in embodied AI.Comment: ICCV 201
Towards Unsupervised Single-Channel Blind Source Separation using Adversarial Pair Unmix-and-Remix
Blind single-channel source separation is a long standing signal processing
challenge. Many methods were proposed to solve this task utilizing multiple
signal priors such as low rank, sparsity, temporal continuity etc. The recent
advance of generative adversarial models presented new opportunities in signal
regression tasks. The power of adversarial training however has not yet been
realized for blind source separation tasks. In this work, we propose a novel
method for blind source separation (BSS) using adversarial methods. We rely on
the independence of sources for creating adversarial constraints on pairs of
approximately separated sources, which ensure good separation. Experiments are
carried out on image sources validating the good performance of our approach,
and presenting our method as a promising approach for solving BSS for general
signals.Comment: ICASSP'1
UGC-VQA: Benchmarking Blind Video Quality Assessment for User Generated Content
Recent years have witnessed an explosion of user-generated content (UGC)
videos shared and streamed over the Internet, thanks to the evolution of
affordable and reliable consumer capture devices, and the tremendous popularity
of social media platforms. Accordingly, there is a great need for accurate
video quality assessment (VQA) models for UGC/consumer videos to monitor,
control, and optimize this vast content. Blind quality prediction of
in-the-wild videos is quite challenging, since the quality degradations of UGC
content are unpredictable, complicated, and often commingled. Here we
contribute to advancing the UGC-VQA problem by conducting a comprehensive
evaluation of leading no-reference/blind VQA (BVQA) features and models on a
fixed evaluation architecture, yielding new empirical insights on both
subjective video quality studies and VQA model design. By employing a feature
selection strategy on top of leading VQA model features, we are able to extract
60 of the 763 statistical features used by the leading models to create a new
fusion-based BVQA model, which we dub the \textbf{VID}eo quality
\textbf{EVAL}uator (VIDEVAL), that effectively balances the trade-off between
VQA performance and efficiency. Our experimental results show that VIDEVAL
achieves state-of-the-art performance at considerably lower computational cost
than other leading models. Our study protocol also defines a reliable benchmark
for the UGC-VQA problem, which we believe will facilitate further research on
deep learning-based VQA modeling, as well as perceptually-optimized efficient
UGC video processing, transcoding, and streaming. To promote reproducible
research and public evaluation, an implementation of VIDEVAL has been made
available online: \url{https://github.com/tu184044109/VIDEVAL_release}.Comment: 13 pages, 11 figures, 11 table
On the Relation between Color Image Denoising and Classification
Large amount of image denoising literature focuses on single channel images
and often experimentally validates the proposed methods on tens of images at
most. In this paper, we investigate the interaction between denoising and
classification on large scale dataset. Inspired by classification models, we
propose a novel deep learning architecture for color (multichannel) image
denoising and report on thousands of images from ImageNet dataset as well as
commonly used imagery. We study the importance of (sufficient) training data,
how semantic class information can be traded for improved denoising results. As
a result, our method greatly improves PSNR performance by 0.34 - 0.51 dB on
average over state-of-the art methods on large scale dataset. We conclude that
it is beneficial to incorporate in classification models. On the other hand, we
also study how noise affect classification performance. In the end, we come to
a number of interesting conclusions, some being counter-intuitive
Stochastic Multiple Choice Learning for Training Diverse Deep Ensembles
Many practical perception systems exist within larger processes that include
interactions with users or additional components capable of evaluating the
quality of predicted solutions. In these contexts, it is beneficial to provide
these oracle mechanisms with multiple highly likely hypotheses rather than a
single prediction. In this work, we pose the task of producing multiple outputs
as a learning problem over an ensemble of deep networks -- introducing a novel
stochastic gradient descent based approach to minimize the loss with respect to
an oracle. Our method is simple to implement, agnostic to both architecture and
loss function, and parameter-free. Our approach achieves lower oracle error
compared to existing methods on a wide range of tasks and deep architectures.
We also show qualitatively that the diverse solutions produced often provide
interpretable representations of task ambiguity
Deep Demosaicing for Edge Implementation
Most digital cameras use sensors coated with a Color Filter Array (CFA) to
capture channel components at every pixel location, resulting in a mosaic image
that does not contain pixel values in all channels. Current research on
reconstructing these missing channels, also known as demosaicing, introduces
many artifacts, such as zipper effect and false color. Many deep learning
demosaicing techniques outperform other classical techniques in reducing the
impact of artifacts. However, most of these models tend to be
over-parametrized. Consequently, edge implementation of the state-of-the-art
deep learning-based demosaicing algorithms on low-end edge devices is a major
challenge. We provide an exhaustive search of deep neural network architectures
and obtain a pareto front of Color Peak Signal to Noise Ratio (CPSNR) as the
performance criterion versus the number of parameters as the model complexity
that beats the state-of-the-art. Architectures on the pareto front can then be
used to choose the best architecture for a variety of resource constraints.
Simple architecture search methods such as exhaustive search and grid search
require some conditions of the loss function to converge to the optimum. We
clarify these conditions in a brief theoretical study.Comment: Accepted in the 16th International Conference of Image Analysis and
Recognition (ICIAR 2019
Video Description: A Survey of Methods, Datasets and Evaluation Metrics
Video description is the automatic generation of natural language sentences
that describe the contents of a given video. It has applications in human-robot
interaction, helping the visually impaired and video subtitling. The past few
years have seen a surge of research in this area due to the unprecedented
success of deep learning in computer vision and natural language processing.
Numerous methods, datasets and evaluation metrics have been proposed in the
literature, calling the need for a comprehensive survey to focus research
efforts in this flourishing new direction. This paper fills the gap by
surveying the state of the art approaches with a focus on deep learning models;
comparing benchmark datasets in terms of their domains, number of classes, and
repository size; and identifying the pros and cons of various evaluation
metrics like SPICE, CIDEr, ROUGE, BLEU, METEOR, and WMD. Classical video
description approaches combined subject, object and verb detection with
template based language models to generate sentences. However, the release of
large datasets revealed that these methods can not cope with the diversity in
unconstrained open domain videos. Classical approaches were followed by a very
short era of statistical methods which were soon replaced with deep learning,
the current state of the art in video description. Our survey shows that
despite the fast-paced developments, video description research is still in its
infancy due to the following reasons. Analysis of video description models is
challenging because it is difficult to ascertain the contributions, towards
accuracy or errors, of the visual features and the adopted language model in
the final description. Existing datasets neither contain adequate visual
diversity nor complexity of linguistic structures. Finally, current evaluation
metrics ...Comment: Accepted by ACM Computing Survey
- …