59,540 research outputs found
Blind Stereo Image Quality Assessment Inspired by Brain Sensory-Motor Fusion
The use of 3D and stereo imaging is rapidly increasing. Compression,
transmission, and processing could degrade the quality of stereo images.
Quality assessment of such images is different than their 2D counterparts.
Metrics that represent 3D perception by human visual system (HVS) are expected
to assess stereoscopic quality more accurately. In this paper, inspired by
brain sensory/motor fusion process, two stereo images are fused together. Then
from every fused image two synthesized images are extracted. Effects of
different distortions on statistical distributions of the synthesized images
are shown. Based on the observed statistical changes, features are extracted
from these synthesized images. These features can reveal type and severity of
distortions. Then, a stacked neural network model is proposed, which learns the
extracted features and accurately evaluates the quality of stereo images. This
model is tested on 3D images of popular databases. Experimental results show
the superiority of this method over state of the art stereo image quality
assessment approachesComment: 11 pages, 13 figures, 3 table
Prediction of the Influence of Navigation Scan-path on Perceived Quality of Free-Viewpoint Videos
Free-Viewpoint Video (FVV) systems allow the viewers to freely change the
viewpoints of the scene. In such systems, view synthesis and compression are
the two main sources of artifacts influencing the perceived quality. To assess
this influence, quality evaluation studies are often carried out using
conventional displays and generating predefined navigation trajectories
mimicking the possible movement of the viewers when exploring the content.
Nevertheless, as different trajectories may lead to different conclusions in
terms of visual quality when benchmarking the performance of the systems,
methods to identify critical trajectories are needed. This paper aims at
exploring the impact of exploration trajectories (defined as Hypothetical
Rendering Trajectories: HRT) on perceived quality of FVV subjectively and
objectively, providing two main contributions. Firstly, a subjective assessment
test including different HRTs was carried out and analyzed. The results
demonstrate and quantify the influence of HRT in the perceived quality.
Secondly, we propose a new objective video quality assessment measure to
objectively predict the impact of HRT. This measure, based on Sketch-Token
representation, models how the categories of the contours change spatially and
temporally from a higher semantic level. Performance in comparison with
existing quality metrics for FVV, highlight promising results for automatic
detection of most critical HRTs for the benchmark of immersive systems.Comment: 11 pages, 7 figure
An Efficient Human Visual System Based Quality Metric for 3D Video
Stereoscopic video technologies have been introduced to the consumer market
in the past few years. A key factor in designing a 3D system is to understand
how different visual cues and distortions affect the perceptual quality of
stereoscopic video. The ultimate way to assess 3D video quality is through
subjective tests. However, subjective evaluation is time consuming, expensive,
and in some cases not possible. The other solution is developing objective
quality metrics, which attempt to model the Human Visual System (HVS) in order
to assess perceptual quality. Although several 2D quality metrics have been
proposed for still images and videos, in the case of 3D efforts are only at the
initial stages. In this paper, we propose a new full-reference quality metric
for 3D content. Our method mimics HVS by fusing information of both the left
and right views to construct the cyclopean view, as well as taking to account
the sensitivity of HVS to contrast and the disparity of the views. In addition,
a temporal pooling strategy is utilized to address the effect of temporal
variations of the quality in the video. Performance evaluations showed that our
3D quality metric quantifies quality degradation caused by several
representative types of distortions very accurately, with Pearson correlation
coefficient of 90.8 %, a competitive performance compared to the
state-of-the-art 3D quality metrics
Survey on Error Concealment Strategies and Subjective Testing of 3D Videos
Over the last decade, different technologies to visualize 3D scenes have been
introduced and improved. These technologies include stereoscopic, multi-view,
integral imaging and holographic types. Despite increasing consumer interest;
poor image quality, crosstalk or side effects of 3D displays and also the lack
of defined broadcast standards has hampered the advancement of 3D displays to
the mass consumer market. Also, in real time transmission of 3DTV sequences
over packet-based networks may results in visual quality degradations due to
packet loss and others. In the conventional 2D videos different extrapolation
and directional interpolation strategies have been used for concealing the
missing blocks but in 3D, it is still an emerging field of research. Few
studies have been carried out to define the assessment methods of stereoscopic
images and videos. But through industrial and commercial perspective,
subjective quality evaluation is the most direct way to evaluate human
perception on 3DTV systems. This paper reviews the state-of-the-art error
concealment strategies and the subjective evaluation of 3D videos and proposes
a low complexity frame loss concealment method for the video decoder.
Subjective testing on prominent datasets videos and comparison with existing
concealment methods show that the proposed method is very much efficient to
conceal errors of stereoscopic videos in terms of computation time, comfort and
distortion
Causes of discomfort in stereoscopic content: a review
This paper reviews the causes of discomfort in viewing stereoscopic content.
These include objective factors, such as misaligned images, as well as
subjective factors, such as excessive disparity. Different approaches to the
measurement of visual discomfort are also reviewed, in relation to the
underlying physiological and psychophysical processes. The importance of
understanding these issues, in the context of new display technologies, is
emphasized
Binocular Rivalry - Psychovisual Challenge in Stereoscopic Video Error Concealment
During Stereoscopic 3D (S3D) video transmission, one or both views can be
affected by bit errors and packet losses caused by adverse channel conditions,
delay or jitter. Typically, the Human Visual System (HVS) is incapable of
aligning and fusing stereoscopic content if one view is affected by artefacts
caused by compression, transmission and rendering with distorted patterns being
perceived as alterations of the original which presents a shimmering effect
known as binocular rivalry and is detrimental to a user's Quality of Experience
(QoE). This study attempts to quantify the effects of binocular rivalry for
stereoscopic videos. Existing approaches, in which one or more frames are lost
in one or both views undergo error concealment, are implemented. Then,
subjective testing is carried out on the error concealed 3D video sequences.
The evaluations provided by these subjects were then combined and analysed
using a standard Student t-test thus quantifying the impact of binocular
rivalry and allowing the impact to be compared with that of monocular viewing.
The main focus is implementing error-resilient video communication, avoiding
the detrimental effects of binocular rivalry and improving the overall QoE of
viewers.Comment: 11 pages, 9 Figure
Investigating Simulation-Based Metrics for Characterizing Linear Iterative Reconstruction in Digital Breast Tomosynthesis
Simulation-based image quality metrics are adapted and investigated for
characterizing the parameter dependences of linear iterative image
reconstruction for DBT. Three metrics based on 2D DBT simulation are
investigated: (1) a root-mean-square-error (RMSE) between the test phantom and
reconstructed image, (2) a gradient RMSE where the comparison is made after
taking a spatial gradient of both image and phantom, and (3) a
region-of-interest (ROI) Hotelling observer (HO) for
signal-known-exactly/background-known-exactly (SKE/BKE) and
signal-known-exactly/background-known-statistically (SKE/BKS) detection tasks.
Two simulation studies are performed using the aforementioned metrics, varying
voxel aspect ratio and regularization strength for two types of Tikhonov
regularized least-squares optimization. The RMSE metrics are applied to a 2D
test phantom and the ROI-HO metric is applied to two tasks relevant to DBT:
large, low contrast lesion detection and small, high contrast
microcalcification detection. The RMSE metric trends are compared with visual
assessment of the reconstructed test phantom. The ROI-HO metric trends are
compared with 3D reconstructed images from ACR phantom data acquired with a
Hologic Selenia Dimensions DBT system. Sensitivity of image RMSE to mean pixel
value is found to limit its applicability to the assessment of DBT image
reconstruction. Image gradient RMSE is insensitive to mean pixel value and
appears to track better with subjective visualization of the reconstructed
bar-pattern phantom. The ROI-HO metric shows an increasing trend with
regularization strength for both forms of Tikhonov-regularized least-squares;
however, this metric saturates at intermediate regularization strength
indicating a point of diminishing returns for signal detection. Visualization
with reconstructed ACR phantom images appears to show a similar dependence with
regularization strength.Comment: The manuscript has been submitted to Medical Physic
Benchmark 3D eye-tracking dataset for visual saliency prediction on stereoscopic 3D video
Visual Attention Models (VAMs) predict the location of an image or video
regions that are most likely to attract human attention. Although saliency
detection is well explored for 2D image and video content, there are only few
attempts made to design 3D saliency prediction models. Newly proposed 3D visual
attention models have to be validated over large-scale video saliency
prediction datasets, which also contain results of eye-tracking information.
There are several publicly available eye-tracking datasets for 2D image and
video content. In the case of 3D, however, there is still a need for
large-scale video saliency datasets for the research community for validating
different 3D-VAMs. In this paper, we introduce a large-scale dataset containing
eye-tracking data collected from 61 stereoscopic 3D videos (and also 2D
versions of those) and 24 subjects participated in a free-viewing test. We
evaluate the performance of the existing saliency detection methods over the
proposed dataset. In addition, we created an online benchmark for validating
the performance of the existing 2D and 3D visual attention models and
facilitate addition of new VAMs to the benchmark. Our benchmark currently
contains 50 different VAMs
Human Pose Forecasting via Deep Markov Models
Human pose forecasting is an important problem in computer vision with
applications to human-robot interaction, visual surveillance, and autonomous
driving. Usually, forecasting algorithms use 3D skeleton sequences and are
trained to forecast for a few milliseconds into the future. Long-range
forecasting is challenging due to the difficulty of estimating how long a
person continues an activity. To this end, our contributions are threefold: (i)
we propose a generative framework for poses using variational autoencoders
based on Deep Markov Models (DMMs); (ii) we evaluate our pose forecasts using a
pose-based action classifier, which we argue better reflects the subjective
quality of pose forecasts than distance in coordinate space; (iii) last, for
evaluation of the new model, we introduce a 480,000-frame video dataset called
Ikea Furniture Assembly (Ikea FA), which depicts humans repeatedly assembling
and disassembling furniture. We demonstrate promising results for our approach
on both Ikea FA and the existing NTU RGB+D dataset.Comment: Accepted to DICTA'1
Perceptual Quality Assessment of Omnidirectional Images as Moving Camera Videos
Omnidirectional images (also referred to as static 360{\deg} panoramas)
impose viewing conditions much different from those of regular 2D images. How
do humans perceive image distortions in immersive virtual reality (VR)
environments is an important problem which receives less attention. We argue
that, apart from the distorted panorama itself, two types of VR viewing
conditions are crucial in determining the viewing behaviors of users and the
perceived quality of the panorama: the starting point and the exploration time.
We first carry out a psychophysical experiment to investigate the interplay
among the VR viewing conditions, the user viewing behaviors, and the perceived
quality of 360{\deg} images. Then, we provide a thorough analysis of the
collected human data, leading to several interesting findings. Moreover, we
propose a computational framework for objective quality assessment of 360{\deg}
images, embodying viewing conditions and behaviors in a delightful way.
Specifically, we first transform an omnidirectional image to several video
representations using different user viewing behaviors under different viewing
conditions. We then leverage advanced 2D full-reference video quality models to
compute the perceived quality. We construct a set of specific quality measures
within the proposed framework, and demonstrate their promises on three VR
quality databases.Comment: 11 pages, 11 figure, 9 tables. This paper has been accepted by IEEE
Transactions on Visualization and Computer Graphic
- …