2 research outputs found
Full-Reference Video Quality Assessment Using Deep 3D Convolutional Neural Networks
We present a novel framework called Deep Video QUality Evaluator (DeepVQUE) for full-reference video quality assessment (FRVQA) using deep 3D convolutional neural networks (3D ConvNets). DeepVQUE is a complementary framework to traditional handcrafted feature based methods in that it uses deep 3D ConvNet models for feature extraction. 3D ConvNets are capable of extracting spatio-temporal features of the video which are vital for video quality assessment (VQA). Most of the existing FRVQA approaches operate on spatial and temporal domains independently followed by pooling, and often ignore the crucial spatio-temporal relationship of intensities in natural videos. In this work, we pay special attention to the contribution of spatio-temporal dependencies in natural videos to quality assessment. Specifically, the proposed approach estimates the spatio-temporal quality of a video with respect to its pristine version by applying commonly used distance measures such as the l1 or the l2 norm to the volume-wise pristine and distorted 3D ConvNet features. Spatial quality is estimated using off-the-shelf full-reference image quality assessment (FRIQA) methods. Overall video quality is estimated using support vector regression (SVR) applied to the spatio-temporal and spatial quality estimates. Additionally, we illustrate the ability of the proposed approach to localize distortions in space and time
Multimedia Quality Assessment
Globally almost 17 exabytes of mobile data is used every month. Almost 10 exabytes of this data
is used to view or download videos. This is possible due to the explosive increase in the smart
phone users around the globe. Most smartphone manufactures claim the quality of images and
videos captured by their devices to be a differentiating factor compared to their competition. Hence
quality measurement becomes crucial for these people to meet the customers expectations. Another
scenario is the amount of videos streamed online has also exploded in recent years. The video
resolution is altered to meet the available bandwidth. Further these videos are viewed on di�erent
display devices. If both the display devices and the service providers have the perceptual quality of
the video they can optimize their respective technologies to provide satisfactory service to the end
user. We present a novel framework called Deep Video Quality Evaluator (DeepVQUE) for doing
full-reference video quality assessment using deep 3D convolutional neural networks (3D ConvNets).
DeepVQUE is a complementary framework to traditional handcrafted feature based methods in
that it uses deep 3D ConvNet models for feature extraction. 3D ConvNets are capable of extracting
spatio-temporal features of the video which are vital for video quality assessment (VQA). Most of
the existing approaches operate on spatial and temporal separately, But the complex relationship
between spatial and temporal is ignored. In this thesis, we study the spatial quality using state-of-
the-art full reference image quality assessment (FRIQA) metrics and spatio-temporal quality using
3D ConvNets. Speci�cally, the proposed approach measures the spatio-temporal quality of a video
with respect to its pristine version by applying widely used distance measures such as the l1 or the
l2 norm to the volume-wise pristine and distorted features. Overall quality of the video is pooled
using support vector regression (SVR) on spatial and spatio-temporal quality scores.
In this thesis, we also study a no-reference image quality assessment which uses a binary classi�er
to estimate the quality of an image. Subjective quality assessment is the most reliable source of
assessment for images or videos. The level of con�dence with which the subjects score images with
low distortion and high distortion is much higher when compared to other levels of distortion. In
some cases, the distortion may not be uniformly distributed throughout the image. This motivated
us to train a classi�er using image patches where each patch is labeled either zero or one based on
the level of distortion. To determine the quality of an image it is divided into patches and passed
through a pre-trained classi�er. The patch wise classi�cation captures local distortion and the overall
quality score for the image is given by the ratio of the number of patches classi�ed as zero over the
total number of patches