2,167 research outputs found
A Convolutional Neural Network Approach for Half-Pel Interpolation in Video Coding
Motion compensation is a fundamental technology in video coding to remove the
temporal redundancy between video frames. To further improve the coding
efficiency, sub-pel motion compensation has been utilized, which requires
interpolation of fractional samples. The video coding standards usually adopt
fixed interpolation filters that are derived from the signal processing theory.
However, as video signal is not stationary, the fixed interpolation filters may
turn out less efficient. Inspired by the great success of convolutional neural
network (CNN) in computer vision, we propose to design a CNN-based
interpolation filter (CNNIF) for video coding. Different from previous studies,
one difficulty for training CNNIF is the lack of ground-truth since the
fractional samples are actually not available. Our solution for this problem is
to derive the "ground-truth" of fractional samples by smoothing high-resolution
images, which is verified to be effective by the conducted experiments.
Compared to the fixed half-pel interpolation filter for luma in High Efficiency
Video Coding (HEVC), our proposed CNNIF achieves up to 3.2% and on average 0.9%
BD-rate reduction under low-delay P configuration.Comment: International Symposium on Circuits and Systems (ISCAS) 201
Motion-Compensated Coding and Frame-Rate Up-Conversion: Models and Analysis
Block-based motion estimation (ME) and compensation (MC) techniques are
widely used in modern video processing algorithms and compression systems. The
great variety of video applications and devices results in numerous compression
specifications. Specifically, there is a diversity of frame-rates and
bit-rates. In this paper, we study the effect of frame-rate and compression
bit-rate on block-based ME and MC as commonly utilized in inter-frame coding
and frame-rate up conversion (FRUC). This joint examination yields a
comprehensive foundation for comparing MC procedures in coding and FRUC. First,
the video signal is modeled as a noisy translational motion of an image. Then,
we theoretically model the motion-compensated prediction of an available and
absent frames as in coding and FRUC applications, respectively. The theoretic
MC-prediction error is further analyzed and its autocorrelation function is
calculated for coding and FRUC applications. We show a linear relation between
the variance of the MC-prediction error and temporal-distance. While the
affecting distance in MC-coding is between the predicted and reference frames,
MC-FRUC is affected by the distance between the available frames used for the
interpolation. Moreover, the dependency in temporal-distance implies an inverse
effect of the frame-rate. FRUC performance analysis considers the prediction
error variance, since it equals to the mean-squared-error of the interpolation.
However, MC-coding analysis requires the entire autocorrelation function of the
error; hence, analytic simplicity is beneficial. Therefore, we propose two
constructions of a separable autocorrelation function for prediction error in
MC-coding. We conclude by comparing our estimations with experimental results
IBVC: Interpolation-driven B-frame Video Compression
Learned B-frame video compression aims to adopt bi-directional motion
estimation and motion compensation (MEMC) coding for middle frame
reconstruction. However, previous learned approaches often directly extend
neural P-frame codecs to B-frame relying on bi-directional optical-flow
estimation or video frame interpolation. They suffer from inaccurate quantized
motions and inefficient motion compensation. To address these issues, we
propose a simple yet effective structure called Interpolation-driven B-frame
Video Compression (IBVC). Our approach only involves two major operations:
video frame interpolation and artifact reduction compression. IBVC introduces a
bit-rate free MEMC based on interpolation, which avoids optical-flow
quantization and additional compression distortions. Later, to reduce duplicate
bit-rate consumption and focus on unaligned artifacts, a residual guided
masking encoder is deployed to adaptively select the meaningful contexts with
interpolated multi-scale dependencies. In addition, a conditional
spatio-temporal decoder is proposed to eliminate location errors and artifacts
instead of using MEMC coding in other methods. The experimental results on
B-frame coding demonstrate that IBVC has significant improvements compared to
the relevant state-of-the-art methods. Meanwhile, our approach can save bit
rates compared with the random access (RA) configuration of H.266 (VTM). The
code will be available at https://github.com/ruhig6/IBVC.Comment: Submitted to IEEE TCSV
Reuse of motion processing for camera stabilization and video coding
The low bit rate of existing video encoders relies heavily on the accuracy of estimating actual motion in the input video sequence. In this paper, we propose a video stabilization and encoding (ViSE) system to achieve a higher coding efficiency through a preceding motion processing stage (to the compression), of which the stabilization part should compensate for vibrating camera motion. The improved motion prediction is obtained by differentiating between the temporal coherent motion and a more noisy motion component which is orthogonal to the coherent one. The system compensates the latter undesirable motion, so that it is eliminated prior to video encoding. To reduce the computational complexity of integrating a digital stabilization algorithm with video encoding, we propose a system that reuses the already evaluated motion vector from the stabilization stage in the compression. As compared to H.264, our system shows a 14% reduction in bit rate yet obtaining an increase of about 0.5 dB in SN
Flexible distribution of complexity by hybrid predictive-distributed video coding
There is currently limited flexibility for distributing complexity in a video coding system. While rate-distortion-complexity (RDC) optimization techniques have been proposed for conventional predictive video coding with encoder-side motion estimation, they fail to offer true flexible distribution of complexity between encoder and decoder since the encoder is assumed to have always more computational resources available than the decoder. On the other hand, distributed video coding solutions with decoder-side motion estimation have been proposed, but hardly any RDC optimized systems have been developed.
To offer more flexibility for video applications involving multi-tasking or battery-constrained devices, in this paper, we propose a codec combining predictive video coding concepts and techniques from distributed video coding and show the flexibility of this method in distributing complexity. We propose several modes to code frames, and provide complexity analysis illustrating encoder and decoder computational complexity for each mode. Rate distortion results for each mode indicate that the coding efficiency is similar. We describe a method to choose which mode to use for coding each inter frame, taking into account encoder and decoder complexity constraints, and illustrate how complexity is distributed more flexibly
On the Efficiency of Angular Intraprediction
Angular intraprediction (AIP) is a coding tool that has been incorporated into the video coding standards H.264/AVC (Advanced Audio Coding) and High Efficient Video Coding. In this paper, we study how the efficiency of AIP depends on its prediction parameters. To carry out this paper, we first theoretically analyze the variance of the error incurred when a perfectly directional signal is predicted in a certain direction. The results of this analysis are then used to study the efficiency of AIP when it is applied to a distribution of directions. To facilitate mathematical derivations, we make several assumptions about the signal and the prediction process, and we use some approximations. This allows us to obtain simple expressions for the variance of the AIP prediction error as a function of signal and prediction parameters. Finally, we compare our theoretical results with the results obtained from the prediction of images containing rectilinear edges. This comparison shows that our theoretical expressions follow the main trends of the experimental results except when AIP is performed with a very high accuracy.Prades Nebot, J. (2014). On the Efficiency of Angular Intraprediction. IEEE Transactions on Image Processing. 23(12):5733-5742. doi:10.1109/TIP.2014.2369954S57335742231
- âŚ