3 research outputs found
Predicting Video Saliency with Object-to-Motion CNN and Two-layer Convolutional LSTM
Over the past few years, deep neural networks (DNNs) have exhibited great
success in predicting the saliency of images. However, there are few works that
apply DNNs to predict the saliency of generic videos. In this paper, we propose
a novel DNN-based video saliency prediction method. Specifically, we establish
a large-scale eye-tracking database of videos (LEDOV), which provides
sufficient data to train the DNN models for predicting video saliency. Through
the statistical analysis of our LEDOV database, we find that human attention is
normally attracted by objects, particularly moving objects or the moving parts
of objects. Accordingly, we propose an object-to-motion convolutional neural
network (OM-CNN) to learn spatio-temporal features for predicting the
intra-frame saliency via exploring the information of both objectness and
object motion. We further find from our database that there exists a temporal
correlation of human attention with a smooth saliency transition across video
frames. Therefore, we develop a two-layer convolutional long short-term memory
(2C-LSTM) network in our DNN-based method, using the extracted features of
OM-CNN as the input. Consequently, the inter-frame saliency maps of videos can
be generated, which consider the transition of attention across video frames.
Finally, the experimental results show that our method advances the
state-of-the-art in video saliency prediction.Comment: Jiang, Lai and Xu, Mai and Liu, Tie and Qiao, Minglang and Wang,
Zulin; DeepVS: A Deep Learning Based Video Saliency Prediction Approach;The
European Conference on Computer Vision (ECCV); September 201
Benchmark 3D eye-tracking dataset for visual saliency prediction on stereoscopic 3D video
Visual Attention Models (VAMs) predict the location of an image or video
regions that are most likely to attract human attention. Although saliency
detection is well explored for 2D image and video content, there are only few
attempts made to design 3D saliency prediction models. Newly proposed 3D visual
attention models have to be validated over large-scale video saliency
prediction datasets, which also contain results of eye-tracking information.
There are several publicly available eye-tracking datasets for 2D image and
video content. In the case of 3D, however, there is still a need for
large-scale video saliency datasets for the research community for validating
different 3D-VAMs. In this paper, we introduce a large-scale dataset containing
eye-tracking data collected from 61 stereoscopic 3D videos (and also 2D
versions of those) and 24 subjects participated in a free-viewing test. We
evaluate the performance of the existing saliency detection methods over the
proposed dataset. In addition, we created an online benchmark for validating
the performance of the existing 2D and 3D visual attention models and
facilitate addition of new VAMs to the benchmark. Our benchmark currently
contains 50 different VAMs
3D Video Quality Assessment
A key factor in designing 3D systems is to understand how different visual
cues and distortions affect the perceptual quality of 3D video. The ultimate
way to assess video quality is through subjective tests. However, subjective
evaluation is time consuming, expensive, and in most cases not even possible.
An alternative solution is objective quality metrics, which attempt to model
the Human Visual System (HVS) in order to assess the perceptual quality. The
potential of 3D technology to significantly improve the immersiveness of video
content has been hampered by the difficulty of objectively assessing Quality of
Experience (QoE). A no-reference (NR) objective 3D quality metric, which could
help determine capturing parameters and improve playback perceptual quality,
would be welcomed by camera and display manufactures. Network providers would
embrace a full-reference (FR) 3D quality metric, as they could use it to ensure
efficient QoE-based resource management during compression and Quality of
Service (QoS) during transmission.Comment: PhD Thesis, UBC, 201