898 research outputs found
Multi-Frame Quality Enhancement for Compressed Video
The past few years have witnessed great success in applying deep learning to
enhance the quality of compressed image/video. The existing approaches mainly
focus on enhancing the quality of a single frame, ignoring the similarity
between consecutive frames. In this paper, we investigate that heavy quality
fluctuation exists across compressed video frames, and thus low quality frames
can be enhanced using the neighboring high quality frames, seen as Multi-Frame
Quality Enhancement (MFQE). Accordingly, this paper proposes an MFQE approach
for compressed video, as a first attempt in this direction. In our approach, we
firstly develop a Support Vector Machine (SVM) based detector to locate Peak
Quality Frames (PQFs) in compressed video. Then, a novel Multi-Frame
Convolutional Neural Network (MF-CNN) is designed to enhance the quality of
compressed video, in which the non-PQF and its nearest two PQFs are as the
input. The MF-CNN compensates motion between the non-PQF and PQFs through the
Motion Compensation subnet (MC-subnet). Subsequently, the Quality Enhancement
subnet (QE-subnet) reduces compression artifacts of the non-PQF with the help
of its nearest PQFs. Finally, the experiments validate the effectiveness and
generality of our MFQE approach in advancing the state-of-the-art quality
enhancement of compressed video. The code of our MFQE approach is available at
https://github.com/ryangBUAA/MFQE.gitComment: to appear in CVPR 201
Rate-Accuracy Trade-Off In Video Classification With Deep Convolutional Neural Networks
Advanced video classification systems decode video frames to derive the
necessary texture and motion representations for ingestion and analysis by
spatio-temporal deep convolutional neural networks (CNNs). However, when
considering visual Internet-of-Things applications, surveillance systems and
semantic crawlers of large video repositories, the video capture and the
CNN-based semantic analysis parts do not tend to be co-located. This
necessitates the transport of compressed video over networks and incurs
significant overhead in bandwidth and energy consumption, thereby significantly
undermining the deployment potential of such systems. In this paper, we
investigate the trade-off between the encoding bitrate and the achievable
accuracy of CNN-based video classification models that directly ingest
AVC/H.264 and HEVC encoded videos. Instead of retaining entire compressed video
bitstreams and applying complex optical flow calculations prior to CNN
processing, we only retain motion vector and select texture information at
significantly-reduced bitrates and apply no additional processing prior to CNN
ingestion. Based on three CNN architectures and two action recognition
datasets, we achieve 11%-94% saving in bitrate with marginal effect on
classification accuracy. A model-based selection between multiple CNNs
increases these savings further, to the point where, if up to 7% loss of
accuracy can be tolerated, video classification can take place with as little
as 3 kbps for the transport of the required compressed video information to the
system implementing the CNN models
Optimized Data Representation for Interactive Multiview Navigation
In contrary to traditional media streaming services where a unique media
content is delivered to different users, interactive multiview navigation
applications enable users to choose their own viewpoints and freely navigate in
a 3-D scene. The interactivity brings new challenges in addition to the
classical rate-distortion trade-off, which considers only the compression
performance and viewing quality. On the one hand, interactivity necessitates
sufficient viewpoints for richer navigation; on the other hand, it requires to
provide low bandwidth and delay costs for smooth navigation during view
transitions. In this paper, we formally describe the novel trade-offs posed by
the navigation interactivity and classical rate-distortion criterion. Based on
an original formulation, we look for the optimal design of the data
representation by introducing novel rate and distortion models and practical
solving algorithms. Experiments show that the proposed data representation
method outperforms the baseline solution by providing lower resource
consumptions and higher visual quality in all navigation configurations, which
certainly confirms the potential of the proposed data representation in
practical interactive navigation systems
- …