185 research outputs found
A domain adaptive deep learning solution for scanpath prediction of paintings
Cultural heritage understanding and preservation is an important issue for
society as it represents a fundamental aspect of its identity. Paintings
represent a significant part of cultural heritage, and are the subject of study
continuously. However, the way viewers perceive paintings is strictly related
to the so-called HVS (Human Vision System) behaviour. This paper focuses on the
eye-movement analysis of viewers during the visual experience of a certain
number of paintings. In further details, we introduce a new approach to
predicting human visual attention, which impacts several cognitive functions
for humans, including the fundamental understanding of a scene, and then extend
it to painting images. The proposed new architecture ingests images and returns
scanpaths, a sequence of points featuring a high likelihood of catching
viewers' attention. We use an FCNN (Fully Convolutional Neural Network), in
which we exploit a differentiable channel-wise selection and Soft-Argmax
modules. We also incorporate learnable Gaussian distributions onto the network
bottleneck to simulate visual attention process bias in natural scene images.
Furthermore, to reduce the effect of shifts between different domains (i.e.
natural images, painting), we urge the model to learn unsupervised general
features from other domains using a gradient reversal classifier. The results
obtained by our model outperform existing state-of-the-art ones in terms of
accuracy and efficiency.Comment: Accepted at CBMI2022 graz, austri
End-to-end deep multi-score model for No-reference stereoscopic image quality assessment
Deep learning-based quality metrics have recently given significant
improvement in Image Quality Assessment (IQA). In the field of stereoscopic
vision, information is evenly distributed with slight disparity to the left and
right eyes. However, due to asymmetric distortion, the objective quality
ratings for the left and right images would differ, necessitating the learning
of unique quality indicators for each view. Unlike existing stereoscopic IQA
measures which focus mainly on estimating a global human score, we suggest
incorporating left, right, and stereoscopic objective scores to extract the
corresponding properties of each view, and so forth estimating stereoscopic
image quality without reference. Therefore, we use a deep multi-score
Convolutional Neural Network (CNN). Our model has been trained to perform four
tasks: First, predict the left view's quality. Second, predict the quality of
the left view. Third and fourth, predict the quality of the stereo view and
global quality, respectively, with the global score serving as the ultimate
quality. Experiments are conducted on Waterloo IVC 3D Phase 1 and Phase 2
databases. The results obtained show the superiority of our method when
comparing with those of the state-of-the-art. The implementation code can be
found at: https://github.com/o-messai/multi-score-SIQ
Perceptual Quality Evaluation of 3D Triangle Mesh: A Technical Review
© 2018 IEEE. During mesh processing operations (e.g. simplifications, compression, and watermarking), a 3D triangle mesh is subject to various visible distortions on mesh surface which result in a need to estimate visual quality. The necessity of perceptual quality evaluation is already established, as in most cases, human beings are the end users of 3D meshes. To measure such kinds of distortions, the metrics that consider geometric measures integrating human visual system (HVS) is called perceptual quality metrics. In this paper, we direct an expansive study on 3D mesh quality evaluation mostly focusing on recently proposed perceptual based metrics. We limit our study on greyscale static mesh evaluation and attempt to figure out the most workable method for real-Time evaluation by making a quantitative comparison. This paper also discusses in detail how to evaluate objective metric's performance with existing subjective databases. In this work, we likewise research the utilization of the psychometric function to expel non-linearity between subjective and objective values. Finally, we draw a comparison among some selected quality metrics and it shows that curvature tensor based quality metrics predicts consistent result in terms of correlation
Machine Learning for Multimedia Communications
Machine learning is revolutionizing the way multimedia information is processed and transmitted to users. After intensive and powerful training, some impressive efficiency/accuracy improvements have been made all over the transmission pipeline. For example, the high model capacity of the learning-based architectures enables us to accurately model the image and video behavior such that tremendous compression gains can be achieved. Similarly, error concealment, streaming strategy or even user perception modeling have widely benefited from the recent learningoriented developments. However, learning-based algorithms often imply drastic changes to the way data are represented or consumed, meaning that the overall pipeline can be affected even though a subpart of it is optimized. In this paper, we review the recent major advances that have been proposed all across the transmission chain, and we discuss their potential impact and the research challenges that they raise
Supervised Deep Learning for Content-Aware Image Retargeting with Fourier Convolutions
Image retargeting aims to alter the size of the image with attention to the
contents. One of the main obstacles to training deep learning models for image
retargeting is the need for a vast labeled dataset. Labeled datasets are
unavailable for training deep learning models in the image retargeting tasks.
As a result, we present a new supervised approach for training deep learning
models. We use the original images as ground truth and create inputs for the
model by resizing and cropping the original images. A second challenge is
generating different image sizes in inference time. However, regular
convolutional neural networks cannot generate images of different sizes than
the input image. To address this issue, we introduced a new method for
supervised learning. In our approach, a mask is generated to show the desired
size and location of the object. Then the mask and the input image are fed to
the network. Comparing image retargeting methods and our proposed method
demonstrates the model's ability to produce high-quality retargeted images.
Afterward, we compute the image quality assessment score for each output image
based on different techniques and illustrate the effectiveness of our approach.Comment: 18 pages, 5 figure
An Image-based model for 3D shape quality measure
In light of increased research on 3D shapes and the increased processing capability of GPUs, there has been a significant
increase in available 3D applications. In many applications, assessment of perceptual quality of 3D shapes is required. Due
to the nature of 3D representation, this quality assessment may take various forms. While it is straightforward to measure
geometric distortions directly on the 3D shape geometry, such measures are often inconsistent with human perception of quality.
In most cases, human viewers tend to perceive 3D shapes from their 2D renderings. It is therefore plausible to measure shape
quality using their 2D renderings. In this paper, we present an image-based quality metric for evaluating 3D shape quality
given the original and distorted shapes. To provide a good coverage of 3D geometry from different views, we render each shape
from 12 equally spaced views, along with a variety of rendering styles to capture different aspects of visual characteristics.
Image-based metrics such as SSIM (Structure Similarity Index Measure) are then used to measure the quality of 3D shapes. Our
experiments show that by effectively selecting a suitable combination of rendering styles and building a neural network based
model, we achieve significantly better prediction for subjective perceptual quality than existing methods
- …