6,334 research outputs found
Perceptual Quality Assessment of Omnidirectional Images as Moving Camera Videos
Omnidirectional images (also referred to as static 360{\deg} panoramas)
impose viewing conditions much different from those of regular 2D images. How
do humans perceive image distortions in immersive virtual reality (VR)
environments is an important problem which receives less attention. We argue
that, apart from the distorted panorama itself, two types of VR viewing
conditions are crucial in determining the viewing behaviors of users and the
perceived quality of the panorama: the starting point and the exploration time.
We first carry out a psychophysical experiment to investigate the interplay
among the VR viewing conditions, the user viewing behaviors, and the perceived
quality of 360{\deg} images. Then, we provide a thorough analysis of the
collected human data, leading to several interesting findings. Moreover, we
propose a computational framework for objective quality assessment of 360{\deg}
images, embodying viewing conditions and behaviors in a delightful way.
Specifically, we first transform an omnidirectional image to several video
representations using different user viewing behaviors under different viewing
conditions. We then leverage advanced 2D full-reference video quality models to
compute the perceived quality. We construct a set of specific quality measures
within the proposed framework, and demonstrate their promises on three VR
quality databases.Comment: 11 pages, 11 figure, 9 tables. This paper has been accepted by IEEE
Transactions on Visualization and Computer Graphic
Non-contact hemodynamic imaging reveals the jugular venous pulse waveform
Cardiovascular monitoring is important to prevent diseases from progressing.
The jugular venous pulse (JVP) waveform offers important clinical information
about cardiac health, but is not routinely examined due to its invasive
catheterisation procedure. Here, we demonstrate for the first time that the JVP
can be consistently observed in a non-contact manner using a novel light-based
photoplethysmographic imaging system, coded hemodynamic imaging (CHI). While
traditional monitoring methods measure the JVP at a single location, CHI's
wide-field imaging capabilities were able to observe the jugular venous pulse's
spatial flow profile for the first time. The important inflection points in the
JVP were observed, meaning that cardiac abnormalities can be assessed through
JVP distortions. CHI provides a new way to assess cardiac health through
non-contact light-based JVP monitoring, and can be used in non-surgical
environments for cardiac assessment.Comment: 10 pages, 8 figure
UGC-VQA: Benchmarking Blind Video Quality Assessment for User Generated Content
Recent years have witnessed an explosion of user-generated content (UGC)
videos shared and streamed over the Internet, thanks to the evolution of
affordable and reliable consumer capture devices, and the tremendous popularity
of social media platforms. Accordingly, there is a great need for accurate
video quality assessment (VQA) models for UGC/consumer videos to monitor,
control, and optimize this vast content. Blind quality prediction of
in-the-wild videos is quite challenging, since the quality degradations of UGC
content are unpredictable, complicated, and often commingled. Here we
contribute to advancing the UGC-VQA problem by conducting a comprehensive
evaluation of leading no-reference/blind VQA (BVQA) features and models on a
fixed evaluation architecture, yielding new empirical insights on both
subjective video quality studies and VQA model design. By employing a feature
selection strategy on top of leading VQA model features, we are able to extract
60 of the 763 statistical features used by the leading models to create a new
fusion-based BVQA model, which we dub the \textbf{VID}eo quality
\textbf{EVAL}uator (VIDEVAL), that effectively balances the trade-off between
VQA performance and efficiency. Our experimental results show that VIDEVAL
achieves state-of-the-art performance at considerably lower computational cost
than other leading models. Our study protocol also defines a reliable benchmark
for the UGC-VQA problem, which we believe will facilitate further research on
deep learning-based VQA modeling, as well as perceptually-optimized efficient
UGC video processing, transcoding, and streaming. To promote reproducible
research and public evaluation, an implementation of VIDEVAL has been made
available online: \url{https://github.com/tu184044109/VIDEVAL_release}.Comment: 13 pages, 11 figures, 11 table
Benchmark 3D eye-tracking dataset for visual saliency prediction on stereoscopic 3D video
Visual Attention Models (VAMs) predict the location of an image or video
regions that are most likely to attract human attention. Although saliency
detection is well explored for 2D image and video content, there are only few
attempts made to design 3D saliency prediction models. Newly proposed 3D visual
attention models have to be validated over large-scale video saliency
prediction datasets, which also contain results of eye-tracking information.
There are several publicly available eye-tracking datasets for 2D image and
video content. In the case of 3D, however, there is still a need for
large-scale video saliency datasets for the research community for validating
different 3D-VAMs. In this paper, we introduce a large-scale dataset containing
eye-tracking data collected from 61 stereoscopic 3D videos (and also 2D
versions of those) and 24 subjects participated in a free-viewing test. We
evaluate the performance of the existing saliency detection methods over the
proposed dataset. In addition, we created an online benchmark for validating
the performance of the existing 2D and 3D visual attention models and
facilitate addition of new VAMs to the benchmark. Our benchmark currently
contains 50 different VAMs
Memory In Memory: A Predictive Neural Network for Learning Higher-Order Non-Stationarity from Spatiotemporal Dynamics
Natural spatiotemporal processes can be highly non-stationary in many ways,
e.g. the low-level non-stationarity such as spatial correlations or temporal
dependencies of local pixel values; and the high-level variations such as the
accumulation, deformation or dissipation of radar echoes in precipitation
forecasting. From Cramer's Decomposition, any non-stationary process can be
decomposed into deterministic, time-variant polynomials, plus a zero-mean
stochastic term. By applying differencing operations appropriately, we may turn
time-variant polynomials into a constant, making the deterministic component
predictable. However, most previous recurrent neural networks for
spatiotemporal prediction do not use the differential signals effectively, and
their relatively simple state transition functions prevent them from learning
too complicated variations in spacetime. We propose the Memory In Memory (MIM)
networks and corresponding recurrent blocks for this purpose. The MIM blocks
exploit the differential signals between adjacent recurrent states to model the
non-stationary and approximately stationary properties in spatiotemporal
dynamics with two cascaded, self-renewed memory modules. By stacking multiple
MIM blocks, we could potentially handle higher-order non-stationarity. The MIM
networks achieve the state-of-the-art results on four spatiotemporal prediction
tasks across both synthetic and real-world datasets. We believe that the
general idea of this work can be potentially applied to other time-series
forecasting tasks
Learning to Predict Streaming Video QoE: Distortions, Rebuffering and Memory
Mobile streaming video data accounts for a large and increasing percentage of
wireless network traffic. The available bandwidths of modern wireless networks
are often unstable, leading to difficulties in delivering smooth, high-quality
video. Streaming service providers such as Netflix and YouTube attempt to adapt
their systems to adjust in response to these bandwidth limitations by changing
the video bitrate or, failing that, allowing playback interruptions
(rebuffering). Being able to predict end user' quality of experience (QoE)
resulting from these adjustments could lead to perceptually-driven network
resource allocation strategies that would deliver streaming content of higher
quality to clients, while being cost effective for providers. Existing
objective QoE models only consider the effects on user QoE of video quality
changes or playback interruptions. For streaming applications, adaptive network
strategies may involve a combination of dynamic bitrate allocation along with
playback interruptions when the available bandwidth reaches a very low value.
Towards effectively predicting user QoE, we propose Video Assessment of
TemporaL Artifacts and Stalls (Video ATLAS): a machine learning framework where
we combine a number of QoE-related features, including objective quality
features, rebuffering-aware features and memory-driven features to make QoE
predictions. We evaluated our learning-based QoE prediction model on the
recently designed LIVE-Netflix Video QoE Database which consists of practical
playout patterns, where the videos are afflicted by both quality changes and
rebuffering events, and found that it provides improved performance over
state-of-the-art video quality metrics while generalizing well on different
datasets. The proposed algorithm is made publicly available at
http://live.ece.utexas.edu/research/Quality/VideoATLAS release_v2.rar.Comment: under review in Transactions on Image Processin
Subjective Assessment of H.264 Compressed Stereoscopic Video
The tremendous growth in 3D (stereo) imaging and display technologies has led
to stereoscopic content (video and image) becoming increasingly popular.
However, both the subjective and the objective evaluation of stereoscopic video
content has not kept pace with the rapid growth of the content. Further, the
availability of standard stereoscopic video databases is also quite limited. In
this work, we attempt to alleviate these shortcomings. We present a
stereoscopic video database and its subjective evaluation. We have created a
database containing a set of 144 distorted videos. We limit our attention to
H.264 compression artifacts. The distorted videos were generated using 6
uncompressed pristine videos of left and right views originally created by
Goldmann et al. at EPFL [1]. Further, 19 subjects participated in the
subjective assessment task. Based on the subjective study, we have formulated a
relation between the 2D and stereoscopic subjective scores as a function of
compression rate and depth range. We have also evaluated the performance of
popular 2D and 3D image/video quality assessment (I/VQA) algorithms on our
database.Comment: 5 pages, 4 figure
What Can Spatiotemporal Characteristics of Movements in RAMIS Tell Us?
Quantitative characterization of surgical movements can improve the quality
of patient care by informing the development of new training protocols for
surgeons, and the design and control of surgical robots. Here, we present a
novel characterization of open and teleoperated suturing movements that is
based on principles from computational motor control. We focus on the
extensively-studied relationship between the speed of movement and its
geometry. In three-dimensional movements, this relationship is defined by the
one-sixth power law that relates between the speed, the curvature, and the
torsion of movement trajectories. We fitted the parameters of the one-sixth
power law to suturing movements of participants with different levels of
surgical experience in open (using sensorized forceps) and teleoperated (using
the da Vinci Research Kit / da Vinci Surgical System) conditions from two
different datasets. We found that teleoperation significantly affected the
parameters of the power law, and that there were large differences between
different stages of movement. These results open a new avenue for studying the
effect of teleoperation on the spatiotemporal characteristics of the movements
of surgeons, and lay the foundation for the development of new algorithms for
automatic segmentation of surgical tasks.Comment: Preprint of an article submitted for consideration in Journal of
Medical Robotics Research, \c{opyright} 2017 copyright World Scientific
Publishing Company, http://www.worldscientific.com/worldscinet/jmr
Spatiotemporal Video Quality Assessment Method via Multiple Feature Mappings
Progressed video quality assessment (VQA) methods aim to evaluate the perceptual quality of videos in many applications but often prompt to increase computational complexity. Problems derive from the complexity of the distorted videos that are of significant concern in the communication industry, as well as the spatial-temporal content of the two-fold (spatial and temporal) distortion. Therefore, the findings of the study indicate that the information in the spatiotemporal slice (STS) images are useful in measuring video distortion. This paper mainly focuses on developing on a full reference video quality assessment algorithm estimator that integrates several features of spatiotemporal slices (STSS) of frames to form a high-performance video quality. This research work aims to evaluate video quality by utilizing several VQA databases by the following steps: (1) we first arrange the reference and test video sequences into a spatiotemporal slice representation. A collection of spatiotemporal feature maps were computed on each reference-test video. These response features are then processed by using a Structural Similarity (SSIM) to form a local frame quality. (2) To further enhance the quality assessment, we combine the spatial feature maps with the spatiotemporal feature maps and propose the VQA model, named multiple map similarity feature deviation (MMSFD-STS). (3) We apply a sequential pooling strategy to assemble the quality indices of frames in the video quality scoring. (4) Extensive evaluations on video quality databases show that the proposed VQA algorithm achieves better/competitive performance as compared with other state- of- the- art methods
Can high-density human collective motion be forecasted by spatiotemporal fluctuations?
Concerts, protests, and sporting events are occurring with increasing
frequency and magnitude. The extreme physical conditions common to these events
are known to cause injuries and loss-of-life due to the emergence of collective
motion such as crowd crush, turbulence, and density waves. Mathematical models
of human crowds aimed at enhancing crowd safety by understanding these
phenomena are developed with input from a variety of disciplines. However,
model validation is challenged by a lack of high-quality empirical data and
ethical constraints surrounding human crowd research. Consequently, generalized
model-based approach for real-time monitoring/risk-assessment of crowd
collective motion remains an open problem. Here, we take a model-free approach
to crowd analysis and show that emergent collective motion can be forecasted
directly from video data. We use mode analysis methods from material science
and concepts from non-equilibrium physics to study footage of a human crowd at
an Oasis rock concert. We analyze the attendees positional fluctuations during
a period of crowd turbulence to predict the spatial patterns of an emergent
human density wave. In addition to predicting spatial patterns of collective
motion, we also identify and measure temporal patterns that precede the density
wave and forecast its appearance by 1~s. Looking ahead, widening this
forecasting window beyond 1~s will enable new computer vision technologies for
real-time risk-assessment of emergent human collective motion.Comment: Main Text and Supplementary Information (Combined 20 pages, 12
Figures
- …