123 research outputs found
3-D Motion Estimation and Wireframe Adaptation Including Photometric Effects for Model-Based Coding of Facial Image Sequences
Cataloged from PDF version of article.We propose a novel formulation where 3-D global and local motion estimation and the adaptation of a generic wireframe model to a particular speaker are considered simultaneously within an optical flow based framework including the photometric effects of the motion. We use a flexible wireframe model whose local structure is characterized by the normal vectors of the patches which are related to the coordinates of the nodes. Geometrical constraints that describe the propagation of the movement of the nodes are introduced, which are then efficiently utilized to reduce the number of independent structure parameters. A stochastic relaxation algorithm has been used to determine optimum global motion estimates and the parameters describing the structure of the wireframe model. Results with both simulated and real facial image sequences are provided
Effect of Architectures and Training Methods on the Performance of Learned Video Frame Prediction
We analyze the performance of feedforward vs. recurrent neural network (RNN)
architectures and associated training methods for learned frame prediction. To
this effect, we trained a residual fully convolutional neural network (FCNN), a
convolutional RNN (CRNN), and a convolutional long short-term memory (CLSTM)
network for next frame prediction using the mean square loss. We performed both
stateless and stateful training for recurrent networks. Experimental results
show that the residual FCNN architecture performs the best in terms of peak
signal to noise ratio (PSNR) at the expense of higher training and test
(inference) computational complexity. The CRNN can be trained stably and very
efficiently using the stateful truncated backpropagation through time
procedure, and it requires an order of magnitude less inference runtime to
achieve near real-time frame prediction with an acceptable performance.Comment: Accepted for publication at IEEE ICIP 201
An improvement to MBASIC algorithm for 3D motion and depth estimation
Cataloged from PDF version of article.In model-based coding of facial images, the accuracy of motion and depth parameter estimates strongly affects the coding efficiency. MBASIC is a simple and effective iterative algorithm (recently proposed by Aizawa et al.) for 3-D motion and depth estimation when the initial depth estimates are relatively accurate. In this correspondence, we analyze its performance in the presence of errors in the initial depth estimates and propose a modification to MBASIC algorithm that significantly improves its robustness to random errors with only a small increase in the computational load
DCT Coding of nonrectangularly sampled images
Cataloged from PDF version of article.Discrete cosine transform ( DCT) coding is widely used for compression of rectangularly sampled images. In this letter, we address efficient DCT coding of rectangularly sampled images. To this effect, we discuss an efficient method for the computation of the DCT on nonrectangular sampling grids using the Smith-normal decomposition. Simulation are provided
Multi-Scale Deformable Alignment and Content-Adaptive Inference for Flexible-Rate Bi-Directional Video Compression
The lack of ability to adapt the motion compensation model to video content
is an important limitation of current end-to-end learned video compression
models. This paper advances the state-of-the-art by proposing an adaptive
motion-compensation model for end-to-end rate-distortion optimized hierarchical
bi-directional video compression. In particular, we propose two novelties: i) a
multi-scale deformable alignment scheme at the feature level combined with
multi-scale conditional coding, ii) motion-content adaptive inference. In
addition, we employ a gain unit, which enables a single model to operate at
multiple rate-distortion operating points. We also exploit the gain unit to
control bit allocation among intra-coded vs. bi-directionally coded frames by
fine tuning corresponding models for truly flexible-rate learned video coding.
Experimental results demonstrate state-of-the-art rate-distortion performance
exceeding those of all prior art in learned video coding.Comment: Accepted for publication in IEEE International Conference on Image
Processing (ICIP) 202
Report of the AHG on MPEG-7 Semantic Information Representation
This AHG has been created at the last MPEG meeting in Maui to work in parallel with the CE experiment on the Semantic DS, so as to continue the refinement, both in terms of significance, usage and syntax of the DS’s that have been proposed during the Maui meeting [2] . Following the discussions on the email reflector, the results of a meeting of the US delegation in February and of the discussion during the AHG meeting Mar. 19th, 2000, in Noordwijkerhout, some clarifications were made, though a total convergence has not yet been reached. During the US delagate meeting, an alternative syntax had also been proposed for consideration of the continuation of the CE; it is likely that this will lead to the formulation of competitive solutions to select the best syntax and elementary components of the Semantic DS during the CE process to follow after the 51st MPEG meeting. Below, some listings of the discussions that took place, in reference to the individual mandates of this AHG
Fast Outlier Rejection by Using Parallax-Based Rigidity Constraint for Epipolar Geometry Estimation
A novel approach is presented in order to reject correspondence outliers between frames using the parallax-based rigidity constraint for epipolar geometry estimation. In this approach, the invariance of 3-D relative projective structure of a stationary scene over different views is exploited to eliminate outliers, mostly due to independently moving objects of a typical scene. The proposed approach is compared against a well-known RANSAC-based algorithm by the help of a test-bed. The results showed that the speed-up, gained by utilization of the proposed technique as a preprocessing step before RANSAC-based approach, decreases the execution time of the overall outlier rejection, significantly
3D human action recognition in multiple view scenarios
This paper presents a novel view-independent
approach to the recognition of human gestures of several
people in low resolution sequences from multiple calibrated
cameras. In contraposition with other multi-ocular gesture
recognition systems based on generating a classification on
a fusion of features coming from different views, our system
performs a data fusion (3D representation of the scene) and
then a feature extraction and classification. Motion descriptors
introduced by Bobick et al. for 2D data are extended
to 3D and a set of features based on 3D invariant statistical
moments are computed. Finally, a Bayesian classifier is employed
to perform recognition over a small set of actions. Results
are provided showing the effectiveness of the proposed
algorithm in a SmartRoom scenario.Peer ReviewedPostprint (published version
- …