36,667 research outputs found
A similarity-based approach to perceptual feature validation
Which object properties matter most in human perception may well vary according to sensory modality, an important consideration for the design of multimodal interfaces. In this study, we present a similarity-based method for comparing the perceptual importance of object properties across modalities and show how it can also be used to perceptually validate computational measures of object properties. Similarity measures for a set of three-dimensional (3D) objects varying in shape and texture were gathered from humans in two modalities (vision and touch) and derived from a set of standard 2D and 3D computational measures (image and mesh subtraction, object perimeter, curvature, Gabor jet filter responses, and the Visual Difference Predictor (VDP)). Multidimensional scaling (MDS) was then performed on the similarity data to recover configurations of the stimuli in 2D perceptual/computational spaces. These two dimensions corresponded to the two dimensions of variation in the stimulus set: shape and texture. In the human visual space, shape strongly dominated texture. In the human haptic space, shape and texture were weighted roughly equally. Weights varied considerably across subjects in the haptic experiment, indicating that different strategies were used. Maps derived from shape-dominated computational measures provided good fits to the human visual map. No single computational measure provided a satisfactory fit to the map derived from mean human haptic data, though good fits were found for individual subjects; a combination of measures with individually-adjusted weights may be required to model the human haptic similarity judgments. Our method provides a high-level approach to perceptual validation, which can be applied in both unimodal and multimodal interface design
Deep Neural Networks for No-Reference and Full-Reference Image Quality Assessment
We present a deep neural network-based approach to image quality assessment
(IQA). The network is trained end-to-end and comprises ten convolutional layers
and five pooling layers for feature extraction, and two fully connected layers
for regression, which makes it significantly deeper than related IQA models.
Unique features of the proposed architecture are that: 1) with slight
adaptations it can be used in a no-reference (NR) as well as in a
full-reference (FR) IQA setting and 2) it allows for joint learning of local
quality and local weights, i.e., relative importance of local quality to the
global quality estimate, in an unified framework. Our approach is purely
data-driven and does not rely on hand-crafted features or other types of prior
domain knowledge about the human visual system or image statistics. We evaluate
the proposed approach on the LIVE, CISQ, and TID2013 databases as well as the
LIVE In the wild image quality challenge database and show superior performance
to state-of-the-art NR and FR IQA methods. Finally, cross-database evaluation
shows a high ability to generalize between different databases, indicating a
high robustness of the learned features
The Unreasonable Effectiveness of Deep Features as a Perceptual Metric
While it is nearly effortless for humans to quickly assess the perceptual
similarity between two images, the underlying processes are thought to be quite
complex. Despite this, the most widely used perceptual metrics today, such as
PSNR and SSIM, are simple, shallow functions, and fail to account for many
nuances of human perception. Recently, the deep learning community has found
that features of the VGG network trained on ImageNet classification has been
remarkably useful as a training loss for image synthesis. But how perceptual
are these so-called "perceptual losses"? What elements are critical for their
success? To answer these questions, we introduce a new dataset of human
perceptual similarity judgments. We systematically evaluate deep features
across different architectures and tasks and compare them with classic metrics.
We find that deep features outperform all previous metrics by large margins on
our dataset. More surprisingly, this result is not restricted to
ImageNet-trained VGG features, but holds across different deep architectures
and levels of supervision (supervised, self-supervised, or even unsupervised).
Our results suggest that perceptual similarity is an emergent property shared
across deep visual representations.Comment: Accepted to CVPR 2018; Code and data available at
https://www.github.com/richzhang/PerceptualSimilarit
- …