2 research outputs found

    Full-Reference Video Quality Assessment Using Deep 3D Convolutional Neural Networks

    No full text
    We present a novel framework called Deep Video QUality Evaluator (DeepVQUE) for full-reference video quality assessment (FRVQA) using deep 3D convolutional neural networks (3D ConvNets). DeepVQUE is a complementary framework to traditional handcrafted feature based methods in that it uses deep 3D ConvNet models for feature extraction. 3D ConvNets are capable of extracting spatio-temporal features of the video which are vital for video quality assessment (VQA). Most of the existing FRVQA approaches operate on spatial and temporal domains independently followed by pooling, and often ignore the crucial spatio-temporal relationship of intensities in natural videos. In this work, we pay special attention to the contribution of spatio-temporal dependencies in natural videos to quality assessment. Specifically, the proposed approach estimates the spatio-temporal quality of a video with respect to its pristine version by applying commonly used distance measures such as the l1 or the l2 norm to the volume-wise pristine and distorted 3D ConvNet features. Spatial quality is estimated using off-the-shelf full-reference image quality assessment (FRIQA) methods. Overall video quality is estimated using support vector regression (SVR) applied to the spatio-temporal and spatial quality estimates. Additionally, we illustrate the ability of the proposed approach to localize distortions in space and time

    Multimedia Quality Assessment

    No full text
    Globally almost 17 exabytes of mobile data is used every month. Almost 10 exabytes of this data is used to view or download videos. This is possible due to the explosive increase in the smart phone users around the globe. Most smartphone manufactures claim the quality of images and videos captured by their devices to be a differentiating factor compared to their competition. Hence quality measurement becomes crucial for these people to meet the customers expectations. Another scenario is the amount of videos streamed online has also exploded in recent years. The video resolution is altered to meet the available bandwidth. Further these videos are viewed on di�erent display devices. If both the display devices and the service providers have the perceptual quality of the video they can optimize their respective technologies to provide satisfactory service to the end user. We present a novel framework called Deep Video Quality Evaluator (DeepVQUE) for doing full-reference video quality assessment using deep 3D convolutional neural networks (3D ConvNets). DeepVQUE is a complementary framework to traditional handcrafted feature based methods in that it uses deep 3D ConvNet models for feature extraction. 3D ConvNets are capable of extracting spatio-temporal features of the video which are vital for video quality assessment (VQA). Most of the existing approaches operate on spatial and temporal separately, But the complex relationship between spatial and temporal is ignored. In this thesis, we study the spatial quality using state-of- the-art full reference image quality assessment (FRIQA) metrics and spatio-temporal quality using 3D ConvNets. Speci�cally, the proposed approach measures the spatio-temporal quality of a video with respect to its pristine version by applying widely used distance measures such as the l1 or the l2 norm to the volume-wise pristine and distorted features. Overall quality of the video is pooled using support vector regression (SVR) on spatial and spatio-temporal quality scores. In this thesis, we also study a no-reference image quality assessment which uses a binary classi�er to estimate the quality of an image. Subjective quality assessment is the most reliable source of assessment for images or videos. The level of con�dence with which the subjects score images with low distortion and high distortion is much higher when compared to other levels of distortion. In some cases, the distortion may not be uniformly distributed throughout the image. This motivated us to train a classi�er using image patches where each patch is labeled either zero or one based on the level of distortion. To determine the quality of an image it is divided into patches and passed through a pre-trained classi�er. The patch wise classi�cation captures local distortion and the overall quality score for the image is given by the ratio of the number of patches classi�ed as zero over the total number of patches
    corecore