38 research outputs found
Deep Local and Global Spatiotemporal Feature Aggregation for Blind Video Quality Assessment
In recent years, deep learning has achieved promising success for multimedia
quality assessment, especially for image quality assessment (IQA). However,
since there exist more complex temporal characteristics in videos, very little
work has been done on video quality assessment (VQA) by exploiting powerful
deep convolutional neural networks (DCNNs). In this paper, we propose an
efficient VQA method named Deep SpatioTemporal video Quality assessor (DeepSTQ)
to predict the perceptual quality of various distorted videos in a no-reference
manner. In the proposed DeepSTQ, we first extract local and global
spatiotemporal features by pre-trained deep learning models without fine-tuning
or training from scratch. The composited features consider distorted video
frames as well as frame difference maps from both global and local views. Then,
the feature aggregation is conducted by the regression model to predict the
perceptual video quality. Finally, experimental results demonstrate that our
proposed DeepSTQ outperforms state-of-the-art quality assessment algorithms
Towards a Video Quality Assessment based Framework for Enhancement of Laparoscopic Videos
Laparoscopic videos can be affected by different distortions which may impact
the performance of surgery and introduce surgical errors. In this work, we
propose a framework for automatically detecting and identifying such
distortions and their severity using video quality assessment. There are three
major contributions presented in this work (i) a proposal for a novel video
enhancement framework for laparoscopic surgery; (ii) a publicly available
database for quality assessment of laparoscopic videos evaluated by expert as
well as non-expert observers and (iii) objective video quality assessment of
laparoscopic videos including their correlations with expert and non-expert
scores.Comment: SPIE Medical Imaging 2020 (Draft version
Preserving low-quality video through deep learning
Lossy video stream compression is performed to reduce the bandwidth and storage requirements. Moreover also image compression is a need that arises in many circumstances.It is often the case that older archive are stored at low resolution and with a compression rate suitable for the technology available at the time the video was created. Unfortunately, lossy compression algorithms cause artifact. Such artifacts, usually damage higher frequency details also adding noise or novel image patterns. There are several issues with this phenomenon. Low-quality images can be less pleasant to persons. Object detectors algorithms may have their performance reduced. As a result, given a perturbed version of it, we aim at removing such artifacts to recover the original image. To obtain that, one should reverse the compression process through a complicated non-linear image transformation. We propose a deep neural network able to improve image quality. We show that this model can be optimized either traditionally, directly optimizing an image similarity loss (SSIM), or using a generative adversarial approach (GAN). Our restored images have more photorealistic details with respect to traditional image enhancement networks. Our training procedure based on sub-patches is novel. Moreover, we propose novel testing protocol to evaluate restored images quantitatively. Differently from previously proposed approaches we are able to remove artifacts generated at any quality by inferring the image quality directly from data. Human evaluation and quantitative experiments in object detection show that our GAN generates images with finer consistent details and these details make a difference both for machines and humans
Streaming Video QoE Modeling and Prediction: A Long Short-Term Memory Approach
HTTP based adaptive video streaming has become a popular choice of streaming
due to the reliable transmission and the flexibility offered to adapt to
varying network conditions. However, due to rate adaptation in adaptive
streaming, the quality of the videos at the client keeps varying with time
depending on the end-to-end network conditions. Further, varying network
conditions can lead to the video client running out of playback content
resulting in rebuffering events. These factors affect the user satisfaction and
cause degradation of the user quality of experience (QoE). It is important to
quantify the perceptual QoE of the streaming video users and monitor the same
in a continuous manner so that the QoE degradation can be minimized. However,
the continuous evaluation of QoE is challenging as it is determined by complex
dynamic interactions among the QoE influencing factors. Towards this end, we
present LSTM-QoE, a recurrent neural network based QoE prediction model using a
Long Short-Term Memory (LSTM) network. The LSTM-QoE is a network of cascaded
LSTM blocks to capture the nonlinearities and the complex temporal dependencies
involved in the time varying QoE. Based on an evaluation over several publicly
available continuous QoE databases, we demonstrate that the LSTM-QoE has the
capability to model the QoE dynamics effectively. We compare the proposed model
with the state-of-the-art QoE prediction models and show that it provides
superior performance across these databases. Further, we discuss the state
space perspective for the LSTM-QoE and show the efficacy of the state space
modeling approaches for QoE prediction
NAViDAd: A No-Reference Audio-Visual Quality Metric Based on a Deep Autoencoder
The development of models for quality prediction of both audio and video
signals is a fairly mature field. But, although several multimodal models have
been proposed, the area of audio-visual quality prediction is still an emerging
area. In fact, despite the reasonable performance obtained by combination and
parametric metrics, currently there is no reliable pixel-based audio-visual
quality metric. The approach presented in this work is based on the assumption
that autoencoders, fed with descriptive audio and video features, might produce
a set of features that is able to describe the complex audio and video
interactions. Based on this hypothesis, we propose a No-Reference Audio-Visual
Quality Metric Based on a Deep Autoencoder (NAViDAd). The model visual features
are natural scene statistics (NSS) and spatial-temporal measures of the video
component. Meanwhile, the audio features are obtained by computing the
spectrogram representation of the audio component. The model is formed by a
2-layer framework that includes a deep autoencoder layer and a classification
layer. These two layers are stacked and trained to build the deep neural
network model. The model is trained and tested using a large set of stimuli,
containing representative audio and video artifacts. The model performed well
when tested against the UnB-AV and the LiveNetflix-II databases. %Results shows
that this type of approach produces quality scores that are highly correlated
to subjective quality scores.Comment: 5 page