27,929 research outputs found
Deep Local and Global Spatiotemporal Feature Aggregation for Blind Video Quality Assessment
In recent years, deep learning has achieved promising success for multimedia
quality assessment, especially for image quality assessment (IQA). However,
since there exist more complex temporal characteristics in videos, very little
work has been done on video quality assessment (VQA) by exploiting powerful
deep convolutional neural networks (DCNNs). In this paper, we propose an
efficient VQA method named Deep SpatioTemporal video Quality assessor (DeepSTQ)
to predict the perceptual quality of various distorted videos in a no-reference
manner. In the proposed DeepSTQ, we first extract local and global
spatiotemporal features by pre-trained deep learning models without fine-tuning
or training from scratch. The composited features consider distorted video
frames as well as frame difference maps from both global and local views. Then,
the feature aggregation is conducted by the regression model to predict the
perceptual video quality. Finally, experimental results demonstrate that our
proposed DeepSTQ outperforms state-of-the-art quality assessment algorithms
Exploiting Unlabeled Data in CNNs by Self-supervised Learning to Rank
For many applications the collection of labeled data is expensive laborious.
Exploitation of unlabeled data during training is thus a long pursued objective
of machine learning. Self-supervised learning addresses this by positing an
auxiliary task (different, but related to the supervised task) for which data
is abundantly available. In this paper, we show how ranking can be used as a
proxy task for some regression problems. As another contribution, we propose an
efficient backpropagation technique for Siamese networks which prevents the
redundant computation introduced by the multi-branch network architecture. We
apply our framework to two regression problems: Image Quality Assessment (IQA)
and Crowd Counting. For both we show how to automatically generate ranked image
sets from unlabeled data. Our results show that networks trained to regress to
the ground truth targets for labeled data and to simultaneously learn to rank
unlabeled data obtain significantly better, state-of-the-art results for both
IQA and crowd counting. In addition, we show that measuring network uncertainty
on the self-supervised proxy task is a good measure of informativeness of
unlabeled data. This can be used to drive an algorithm for active learning and
we show that this reduces labeling effort by up to 50%.Comment: Accepted at TPAMI. (Keywords: Learning from rankings, image quality
assessment, crowd counting, active learning). arXiv admin note: text overlap
with arXiv:1803.0309
Multispectral and Hyperspectral Image Fusion by MS/HS Fusion Net
Hyperspectral imaging can help better understand the characteristics of
different materials, compared with traditional image systems. However, only
high-resolution multispectral (HrMS) and low-resolution hyperspectral (LrHS)
images can generally be captured at video rate in practice. In this paper, we
propose a model-based deep learning approach for merging an HrMS and LrHS
images to generate a high-resolution hyperspectral (HrHS) image. In specific,
we construct a novel MS/HS fusion model which takes the observation models of
low-resolution images and the low-rankness knowledge along the spectral mode of
HrHS image into consideration. Then we design an iterative algorithm to solve
the model by exploiting the proximal gradient method. And then, by unfolding
the designed algorithm, we construct a deep network, called MS/HS Fusion Net,
with learning the proximal operators and model parameters by convolutional
neural networks. Experimental results on simulated and real data substantiate
the superiority of our method both visually and quantitatively as compared with
state-of-the-art methods along this line of research.Comment: 10 pages, 7 figure
- …