3,794 research outputs found
A Convolutional Neural Network Approach for Half-Pel Interpolation in Video Coding
Motion compensation is a fundamental technology in video coding to remove the
temporal redundancy between video frames. To further improve the coding
efficiency, sub-pel motion compensation has been utilized, which requires
interpolation of fractional samples. The video coding standards usually adopt
fixed interpolation filters that are derived from the signal processing theory.
However, as video signal is not stationary, the fixed interpolation filters may
turn out less efficient. Inspired by the great success of convolutional neural
network (CNN) in computer vision, we propose to design a CNN-based
interpolation filter (CNNIF) for video coding. Different from previous studies,
one difficulty for training CNNIF is the lack of ground-truth since the
fractional samples are actually not available. Our solution for this problem is
to derive the "ground-truth" of fractional samples by smoothing high-resolution
images, which is verified to be effective by the conducted experiments.
Compared to the fixed half-pel interpolation filter for luma in High Efficiency
Video Coding (HEVC), our proposed CNNIF achieves up to 3.2% and on average 0.9%
BD-rate reduction under low-delay P configuration.Comment: International Symposium on Circuits and Systems (ISCAS) 201
Statistical framework for video decoding complexity modeling and prediction
Video decoding complexity modeling and prediction is an increasingly important issue for efficient resource utilization in a variety of applications, including task scheduling, receiver-driven complexity shaping, and adaptive dynamic voltage scaling. In this paper we present a novel view of this problem based on a statistical framework perspective. We explore the statistical structure (clustering) of the execution time required by each video decoder module (entropy decoding, motion compensation, etc.) in conjunction with complexity features that are easily extractable at encoding time (representing the properties of each module's input source data). For this purpose, we employ Gaussian mixture models (GMMs) and an expectation-maximization algorithm to estimate the joint execution-time - feature probability density function (PDF). A training set of typical video sequences is used for this purpose in an offline estimation process. The obtained GMM representation is used in conjunction with the complexity features of new video sequences to predict the execution time required for the decoding of these sequences. Several prediction approaches are discussed and compared. The potential mismatch between the training set and new video content is addressed by adaptive online joint-PDF re-estimation. An experimental comparison is performed to evaluate the different approaches and compare the proposed prediction scheme with related resource prediction schemes from the literature. The usefulness of the proposed complexity-prediction approaches is demonstrated in an application of rate-distortion-complexity optimized decoding
Neural View-Interpolation for Sparse Light Field Video
We suggest representing light field (LF) videos as "one-off" neural networks (NN), i.e., a learned mapping from view-plus-time coordinates to high-resolution color values, trained on sparse views. Initially, this sounds like a bad idea for three main reasons: First, a NN LF will likely have less quality than a same-sized pixel basis representation. Second, only few training data, e.g., 9 exemplars per frame are available for sparse LF videos. Third, there is no generalization across LFs, but across view and time instead. Consequently, a network needs to be trained for each LF video. Surprisingly, these problems can turn into substantial advantages: Other than the linear pixel basis, a NN has to come up with a compact, non-linear i.e., more intelligent, explanation of color, conditioned on the sparse view and time coordinates. As observed for many NN however, this representation now is interpolatable: if the image output for sparse view coordinates is plausible, it is for all intermediate, continuous coordinates as well. Our specific network architecture involves a differentiable occlusion-aware warping step, which leads to a compact set of trainable parameters and consequently fast learning and fast execution
Applications of MATLAB in Science and Engineering
The book consists of 24 chapters illustrating a wide range of areas where MATLAB tools are applied. These areas include mathematics, physics, chemistry and chemical engineering, mechanical engineering, biological (molecular biology) and medical sciences, communication and control systems, digital signal, image and video processing, system modeling and simulation. Many interesting problems have been included throughout the book, and its contents will be beneficial for students and professionals in wide areas of interest
- …