27 research outputs found
Towards a Semantic Perceptual Image Metric
We present a full reference, perceptual image metric based on VGG-16, an
artificial neural network trained on object classification. We fit the metric
to a new database based on 140k unique images annotated with ground truth by
human raters who received minimal instruction. The resulting metric shows
competitive performance on TID 2013, a database widely used to assess image
quality assessments methods. More interestingly, it shows strong responses to
objects potentially carrying semantic relevance such as faces and text, which
we demonstrate using a visualization technique and ablation experiments. In
effect, the metric appears to model a higher influence of semantic context on
judgments, which we observe particularly in untrained raters. As the vast
majority of users of image processing systems are unfamiliar with Image Quality
Assessment (IQA) tasks, these findings may have significant impact on
real-world applications of perceptual metrics
Bridge the Gap Between VQA and Human Behavior on Omnidirectional Video: A Large-Scale Dataset and a Deep Learning Model
Omnidirectional video enables spherical stimuli with the viewing range. Meanwhile, only the viewport region of omnidirectional
video can be seen by the observer through head movement (HM), and an even
smaller region within the viewport can be clearly perceived through eye
movement (EM). Thus, the subjective quality of omnidirectional video may be
correlated with HM and EM of human behavior. To fill in the gap between
subjective quality and human behavior, this paper proposes a large-scale visual
quality assessment (VQA) dataset of omnidirectional video, called VQA-OV, which
collects 60 reference sequences and 540 impaired sequences. Our VQA-OV dataset
provides not only the subjective quality scores of sequences but also the HM
and EM data of subjects. By mining our dataset, we find that the subjective
quality of omnidirectional video is indeed related to HM and EM. Hence, we
develop a deep learning model, which embeds HM and EM, for objective VQA on
omnidirectional video. Experimental results show that our model significantly
improves the state-of-the-art performance of VQA on omnidirectional video.Comment: Accepted by ACM MM 201
The Unreasonable Effectiveness of Deep Features as a Perceptual Metric
While it is nearly effortless for humans to quickly assess the perceptual
similarity between two images, the underlying processes are thought to be quite
complex. Despite this, the most widely used perceptual metrics today, such as
PSNR and SSIM, are simple, shallow functions, and fail to account for many
nuances of human perception. Recently, the deep learning community has found
that features of the VGG network trained on ImageNet classification has been
remarkably useful as a training loss for image synthesis. But how perceptual
are these so-called "perceptual losses"? What elements are critical for their
success? To answer these questions, we introduce a new dataset of human
perceptual similarity judgments. We systematically evaluate deep features
across different architectures and tasks and compare them with classic metrics.
We find that deep features outperform all previous metrics by large margins on
our dataset. More surprisingly, this result is not restricted to
ImageNet-trained VGG features, but holds across different deep architectures
and levels of supervision (supervised, self-supervised, or even unsupervised).
Our results suggest that perceptual similarity is an emergent property shared
across deep visual representations.Comment: Accepted to CVPR 2018; Code and data available at
https://www.github.com/richzhang/PerceptualSimilarit
Explainable Image Quality Assessments in Teledermatological Photography
Image quality is a crucial factor in the effectiveness and efficiency of
teledermatological consultations. However, up to 50% of images sent by patients
have quality issues, thus increasing the time to diagnosis and treatment. An
automated, easily deployable, explainable method for assessing image quality is
necessary to improve the current teledermatological consultation flow. We
introduce ImageQX, a convolutional neural network for image quality assessment
with a learning mechanism for identifying the most common poor image quality
explanations: bad framing, bad lighting, blur, low resolution, and distance
issues. ImageQX was trained on 26,635 photographs and validated on 9,874
photographs, each annotated with image quality labels and poor image quality
explanations by up to 12 board-certified dermatologists. The photographic
images were taken between 2017 and 2019 using a mobile skin disease tracking
application accessible worldwide. Our method achieves expert-level performance
for both image quality assessment and poor image quality explanation. For image
quality assessment, ImageQX obtains a macro F1-score of 0.73 +- 0.01, which
places it within standard deviation of the pairwise inter-rater F1-score of
0.77 +- 0.07. For poor image quality explanations, our method obtains F1-scores
of between 0.37 +- 0.01 and 0.70 +- 0.01, similar to the inter-rater pairwise
F1-score of between 0.24 +- 0.15 and 0.83 +- 0.06. Moreover, with a size of
only 15 MB, ImageQX is easily deployable on mobile devices. With an image
quality detection performance similar to that of dermatologists, incorporating
ImageQX into the teledermatology flow can enable a better, faster flow for
remote consultations.Comment: Accepted at the Telemedicine and eHealth Journa