220 research outputs found
3-D Motion Estimation and Wireframe Adaptation Including Photometric Effects for Model-Based Coding of Facial Image Sequences
Cataloged from PDF version of article.We propose a novel formulation where 3-D global and local motion estimation and the adaptation of a generic wireframe model to a particular speaker are considered simultaneously within an optical flow based framework including the photometric effects of the motion. We use a flexible wireframe model whose local structure is characterized by the normal vectors of the patches which are related to the coordinates of the nodes. Geometrical constraints that describe the propagation of the movement of the nodes are introduced, which are then efficiently utilized to reduce the number of independent structure parameters. A stochastic relaxation algorithm has been used to determine optimum global motion estimates and the parameters describing the structure of the wireframe model. Results with both simulated and real facial image sequences are provided
DCT Coding of nonrectangularly sampled images
Cataloged from PDF version of article.Discrete cosine transform ( DCT) coding is widely used for compression of rectangularly sampled images. In this letter, we address efficient DCT coding of rectangularly sampled images. To this effect, we discuss an efficient method for the computation of the DCT on nonrectangular sampling grids using the Smith-normal decomposition. Simulation are provided
An improvement to MBASIC algorithm for 3D motion and depth estimation
Cataloged from PDF version of article.In model-based coding of facial images, the accuracy of motion and depth parameter estimates strongly affects the coding efficiency. MBASIC is a simple and effective iterative algorithm (recently proposed by Aizawa et al.) for 3-D motion and depth estimation when the initial depth estimates are relatively accurate. In this correspondence, we analyze its performance in the presence of errors in the initial depth estimates and propose a modification to MBASIC algorithm that significantly improves its robustness to random errors with only a small increase in the computational load
Effect of Architectures and Training Methods on the Performance of Learned Video Frame Prediction
We analyze the performance of feedforward vs. recurrent neural network (RNN)
architectures and associated training methods for learned frame prediction. To
this effect, we trained a residual fully convolutional neural network (FCNN), a
convolutional RNN (CRNN), and a convolutional long short-term memory (CLSTM)
network for next frame prediction using the mean square loss. We performed both
stateless and stateful training for recurrent networks. Experimental results
show that the residual FCNN architecture performs the best in terms of peak
signal to noise ratio (PSNR) at the expense of higher training and test
(inference) computational complexity. The CRNN can be trained stably and very
efficiently using the stateful truncated backpropagation through time
procedure, and it requires an order of magnitude less inference runtime to
achieve near real-time frame prediction with an acceptable performance.Comment: Accepted for publication at IEEE ICIP 201
MMSR: Multiple-Model Learned Image Super-Resolution Benefiting From Class-Specific Image Priors
Assuming a known degradation model, the performance of a learned image
super-resolution (SR) model depends on how well the variety of image
characteristics within the training set matches those in the test set. As a
result, the performance of an SR model varies noticeably from image to image
over a test set depending on whether characteristics of specific images are
similar to those in the training set or not. Hence, in general, a single SR
model cannot generalize well enough for all types of image content. In this
work, we show that training multiple SR models for different classes of images
(e.g., for text, texture, etc.) to exploit class-specific image priors and
employing a post-processing network that learns how to best fuse the outputs
produced by these multiple SR models surpasses the performance of
state-of-the-art generic SR models. Experimental results clearly demonstrate
that the proposed multiple-model SR (MMSR) approach significantly outperforms a
single pre-trained state-of-the-art SR model both quantitatively and visually.
It even exceeds the performance of the best single class-specific SR model
trained on similar text or texture images.Comment: 5 pages, 4 figures, accepted for publication in IEEE ICIP 2022
Conferenc
Multi-Scale Deformable Alignment and Content-Adaptive Inference for Flexible-Rate Bi-Directional Video Compression
The lack of ability to adapt the motion compensation model to video content
is an important limitation of current end-to-end learned video compression
models. This paper advances the state-of-the-art by proposing an adaptive
motion-compensation model for end-to-end rate-distortion optimized hierarchical
bi-directional video compression. In particular, we propose two novelties: i) a
multi-scale deformable alignment scheme at the feature level combined with
multi-scale conditional coding, ii) motion-content adaptive inference. In
addition, we employ a gain unit, which enables a single model to operate at
multiple rate-distortion operating points. We also exploit the gain unit to
control bit allocation among intra-coded vs. bi-directionally coded frames by
fine tuning corresponding models for truly flexible-rate learned video coding.
Experimental results demonstrate state-of-the-art rate-distortion performance
exceeding those of all prior art in learned video coding.Comment: Accepted for publication in IEEE International Conference on Image
Processing (ICIP) 202
- …