476 research outputs found
Spatiotemporal Video Quality Assessment Method via Multiple Feature Mappings
Progressed video quality assessment (VQA) methods aim to evaluate the perceptual quality of videos in many applications but often prompt to increase computational complexity. Problems derive from the complexity of the distorted videos that are of significant concern in the communication industry, as well as the spatial-temporal content of the two-fold (spatial and temporal) distortion. Therefore, the findings of the study indicate that the information in the spatiotemporal slice (STS) images are useful in measuring video distortion. This paper mainly focuses on developing on a full reference video quality assessment algorithm estimator that integrates several features of spatiotemporal slices (STSS) of frames to form a high-performance video quality. This research work aims to evaluate video quality by utilizing several VQA databases by the following steps: (1) we first arrange the reference and test video sequences into a spatiotemporal slice representation. A collection of spatiotemporal feature maps were computed on each reference-test video. These response features are then processed by using a Structural Similarity (SSIM) to form a local frame quality. (2) To further enhance the quality assessment, we combine the spatial feature maps with the spatiotemporal feature maps and propose the VQA model, named multiple map similarity feature deviation (MMSFD-STS). (3) We apply a sequential pooling strategy to assemble the quality indices of frames in the video quality scoring. (4) Extensive evaluations on video quality databases show that the proposed VQA algorithm achieves better/competitive performance as compared with other state- of- the- art methods
Recommended from our members
Perceptual models for high-refresh-rate rendering
Rendering realistic images requires substantial computational power. With new high-refresh-rate displays as well as the renaissance of virtual reality (VR) and augmented reality (AR), one cannot expect that GPU performance will scale fast enough to meet the requirements of immersive photo-realistic rendering with current rendering techniques.
In this dissertation, I follow the dual of the well-known computer vision approach: vision is inverse graphics: to improve graphical algorithms, I consider the operation of the human visual system. I propose to model and exploit the limitations of the visual system in the context of novel high-refresh-rate displays; specifically, I focus on spatio-temporal perception, a topic that has received remarkably less attention than spatial-only perception so far.
I present three main contributions. First, I demonstrate the validity of the perceptual approach by presenting a conceptually simple rendering technique motivated by our eyes' limited sensitivity to high spatio-temporal change which reduces the rendering load and transmission requirement of current-generation VR headsets without introducing perceivable visual artefacts. Second, I present two visual models related to motion perception: (a) a metric for detecting flicker; and (b) a comprehensive visual model to predict perceived motion quality on monitors with arbitrary refresh rates and monitor resolutions. Third, I propose an adaptive rendering algorithm that utilises the proposed models. All algorithms operate on physical colorimetric units (instead of display-referenced pixel values), for which I provide the appropriate display measurements and models. All proposed algorithms and visual models are calibrated and validated with psychophysical experiments
Detecting changes in auditory events
Change deafness is defined as the failure to detect the source of an above-threshold change in an auditory scene. A new paradigm recently demonstrated the phenomenon under analogous conditions to its visual counterpart, change blindness (Hall, Peck, Gaston, & Dickerson, 2015). This investigation examined the use of the paradigm through two experiments which involved the same four simultaneously presented events. Experiment 1 distributed events across a virtual 120º on the azimuth while the target event oscillated across a 60º space throughout each trial. Listeners were instructed to identify the target as soon as possible. Target rate of change was manipulated across four different velocities (80º/s, 40º/s, 24º/s, 8º/s). Results confirmed that all conditions differed in error rates from an isolated control task. The 8º/s condition displayed the highest error rates, providing strong evidence of change deafness, whereas error rates in the 80º/s, 40º/s, and 24º/s conditions did not significantly differ, providing inconclusive evidence. Response times did not vary across conditions. Experiment 2 compared findings to a frequency-based filter manipulation and evaluated change deafness by comparing flickered (one-second and three-second initial presentation) and continuously changing target events, which oscillated between wide- and narrow-band filters. All conditions resulted in error rates that did not vary from the control task. The continuous condition produced increased response times, providing explicit evidence of change deafness. Rapid response times in flicker conditions indicated the elimination of change deafness. The three-second presentation time in one flicker condition further reduced response times, demonstrating the impact of encoding. Experiments support the assessed paradigm as an appropriate method of analyzing the occurrence of change deafness
Full-reference stereoscopic video quality assessment using a motion sensitive HVS model
Stereoscopic video quality assessment has become a major research topic in recent years. Existing stereoscopic video quality metrics are predominantly based on stereoscopic image quality metrics extended to the time domain via for example temporal pooling. These approaches do not explicitly consider the motion sensitivity of the Human Visual System (HVS). To address this limitation, this paper introduces a novel HVS model inspired by physiological findings characterising the motion sensitive response of complex cells in the primary visual cortex (V1 area). The proposed HVS model generalises previous HVS models, which characterised the behaviour of simple and complex cells but ignored motion sensitivity, by estimating optical flow to measure scene velocity at different scales and orientations. The local motion characteristics (direction and amplitude) are used to modulate the output of complex cells. The model is applied to develop a new type of full-reference stereoscopic video quality metrics which uniquely combine non-motion sensitive and motion sensitive energy terms to mimic the response of the HVS. A tailored two-stage multi-variate stepwise regression algorithm is introduced to determine the optimal contribution of each energy term. The two proposed stereoscopic video quality metrics are evaluated on three stereoscopic video datasets. Results indicate that they achieve average correlations with subjective scores of 0.9257 (PLCC), 0.9338 and 0.9120 (SRCC), 0.8622 and 0.8306 (KRCC), and outperform previous stereoscopic video quality metrics including other recent HVS-based metrics
Recommended from our members
Subjective and objective quality assessment for advanced videos
The surge of video streaming services, particularly for high motion content such as sporting events, necessitates advanced techniques to maintain video quality, facing challenges such as capture artifacts and distortions during coding and transmission. The advent of High Dynamic Range (HDR) content, offering a broader and more accurate representation of brightness and color, poses additional complexities due to increased data volume. The critical need for robust Video Quality Assessment (VQA) models arises from these challenges. To meet this need, we conducted three substantial subjective quality studies and constructed corresponding databases. The Laboratory for Image and Video Engineering (LIVE) Livestream Database comprises 315 videos of 45 source sequences from 33 original contents impaired by six types of distortions. This database facilitated the gathering of over 12,000 human opinions from 40 subjects. The LIVE HDR Database, the first of its kind dedicated to HDR10 videos, includes 310 videos from 31 distinct source sequences, processed with ten different compression and resolution combinations. This resource was instrumental in amassing over 20,000 human quality judgments under two different illumination conditions. An additional LIVE HDR AQ was developed with 400 videos from 40 unique source sequences. These videos were processed using varied compression, resolution combinations, and AQ-mode settings, to study the effects of adaptive quantization (AQ) and rate-distortion optimization techniques on HDR video perceptual quality. Building on these invaluable databases, we developed two innovative objective quality models: HDRMAX and HDRGREED. HDRMAX, a pioneering framework designed to create HDR quality-sensitive features, augments the widely-deployed Video Multimethod Assessment Fusion (VMAF) model, yielding significantly improved performance on both HDR and SDR videos. HDRGREED, a novel model leveraging localized histogram equalization and Difference of Gaussian filters, employs the Generalized Gaussian Distribution to model the bandpass responses and measure the entropy variations between reference and distorted videos. This model is particularly sensitive to banding and blocking artifacts introduced by inappropriate AQ settings. In conclusion, the comprehensive subjective quality studies and databases, along with the state-of-the-art objective quality models, HDRMAX and HDRGREED, significantly contribute to the advancement of future VQA models. These tools cater specifically to challenges posed by live streaming and HDR content, providing critical resources for the development, testing, and comparison of future VQA models. These databases, publicly available for research purposes, and the innovative models offer valuable insights to improve and control the perceptual quality of streamed videos.Electrical and Computer Engineerin
Quality Assessment of In-the-Wild Videos
Quality assessment of in-the-wild videos is a challenging problem because of
the absence of reference videos and shooting distortions. Knowledge of the
human visual system can help establish methods for objective quality assessment
of in-the-wild videos. In this work, we show two eminent effects of the human
visual system, namely, content-dependency and temporal-memory effects, could be
used for this purpose. We propose an objective no-reference video quality
assessment method by integrating both effects into a deep neural network. For
content-dependency, we extract features from a pre-trained image classification
neural network for its inherent content-aware property. For temporal-memory
effects, long-term dependencies, especially the temporal hysteresis, are
integrated into the network with a gated recurrent unit and a
subjectively-inspired temporal pooling layer. To validate the performance of
our method, experiments are conducted on three publicly available in-the-wild
video quality assessment databases: KoNViD-1k, CVD2014, and LIVE-Qualcomm,
respectively. Experimental results demonstrate that our proposed method
outperforms five state-of-the-art methods by a large margin, specifically,
12.39%, 15.71%, 15.45%, and 18.09% overall performance improvements over the
second-best method VBLIINDS, in terms of SROCC, KROCC, PLCC and RMSE,
respectively. Moreover, the ablation study verifies the crucial role of both
the content-aware features and the modeling of temporal-memory effects. The
PyTorch implementation of our method is released at
https://github.com/lidq92/VSFA.Comment: 9 pages, 7 figures, 4 tables. ACM Multimedia 2019 camera ready. ->
Update alignment formatting of Table
The role of temporal frequency in continuous flash suppression: A case for a unified framework
In continuous flash suppression (CFS), a rapidly changing Mondrian sequence is presented to one eye in order to suppress a static target presented to the other eye. Targets generally remain suppressed for several seconds at a time, contributing to the widespread use of CFS in studies of unconscious visual processes. Nevertheless, the mechanisms underlying CFS suppression remain unclear, complicating its use and the comprehension of results obtained with the technique. As a starting point, this thesis examined the role of temporal frequency in CFS suppression using carefully controlled stimuli generated by Fourier Transform techniques. A low-level stimulus attribute, the choice of temporal frequency allowed us to evaluate the contributions of early visual processes and test the general assumption that fast update rates drive CFS effectiveness. Three psychophysical studies are described in this thesis, starting with the temporal frequency tuning of CFS (Chapter 2), the relationship between the Mondrian pattern and temporal frequency content (Chapter 3), and finally the role of temporal frequency selectivity in CFS (Chapter 4). Contrary to conventional wisdom, the results showed that the suppression of static targets is largely driven by high spatial frequencies and low temporal frequencies. Faster masker rates, on the other hand, worked best with transient targets. Indicative of early, feature selective processes, these findings are reminiscent of binocular rivalry suppression, demonstrating the possible use of a unified framework
- …