123 research outputs found

    3-D Motion Estimation and Wireframe Adaptation Including Photometric Effects for Model-Based Coding of Facial Image Sequences

    Get PDF
    Cataloged from PDF version of article.We propose a novel formulation where 3-D global and local motion estimation and the adaptation of a generic wireframe model to a particular speaker are considered simultaneously within an optical flow based framework including the photometric effects of the motion. We use a flexible wireframe model whose local structure is characterized by the normal vectors of the patches which are related to the coordinates of the nodes. Geometrical constraints that describe the propagation of the movement of the nodes are introduced, which are then efficiently utilized to reduce the number of independent structure parameters. A stochastic relaxation algorithm has been used to determine optimum global motion estimates and the parameters describing the structure of the wireframe model. Results with both simulated and real facial image sequences are provided

    Effect of Architectures and Training Methods on the Performance of Learned Video Frame Prediction

    Full text link
    We analyze the performance of feedforward vs. recurrent neural network (RNN) architectures and associated training methods for learned frame prediction. To this effect, we trained a residual fully convolutional neural network (FCNN), a convolutional RNN (CRNN), and a convolutional long short-term memory (CLSTM) network for next frame prediction using the mean square loss. We performed both stateless and stateful training for recurrent networks. Experimental results show that the residual FCNN architecture performs the best in terms of peak signal to noise ratio (PSNR) at the expense of higher training and test (inference) computational complexity. The CRNN can be trained stably and very efficiently using the stateful truncated backpropagation through time procedure, and it requires an order of magnitude less inference runtime to achieve near real-time frame prediction with an acceptable performance.Comment: Accepted for publication at IEEE ICIP 201

    An improvement to MBASIC algorithm for 3D motion and depth estimation

    Get PDF
    Cataloged from PDF version of article.In model-based coding of facial images, the accuracy of motion and depth parameter estimates strongly affects the coding efficiency. MBASIC is a simple and effective iterative algorithm (recently proposed by Aizawa et al.) for 3-D motion and depth estimation when the initial depth estimates are relatively accurate. In this correspondence, we analyze its performance in the presence of errors in the initial depth estimates and propose a modification to MBASIC algorithm that significantly improves its robustness to random errors with only a small increase in the computational load

    DCT Coding of nonrectangularly sampled images

    Get PDF
    Cataloged from PDF version of article.Discrete cosine transform ( DCT) coding is widely used for compression of rectangularly sampled images. In this letter, we address efficient DCT coding of rectangularly sampled images. To this effect, we discuss an efficient method for the computation of the DCT on nonrectangular sampling grids using the Smith-normal decomposition. Simulation are provided

    Multi-Scale Deformable Alignment and Content-Adaptive Inference for Flexible-Rate Bi-Directional Video Compression

    Full text link
    The lack of ability to adapt the motion compensation model to video content is an important limitation of current end-to-end learned video compression models. This paper advances the state-of-the-art by proposing an adaptive motion-compensation model for end-to-end rate-distortion optimized hierarchical bi-directional video compression. In particular, we propose two novelties: i) a multi-scale deformable alignment scheme at the feature level combined with multi-scale conditional coding, ii) motion-content adaptive inference. In addition, we employ a gain unit, which enables a single model to operate at multiple rate-distortion operating points. We also exploit the gain unit to control bit allocation among intra-coded vs. bi-directionally coded frames by fine tuning corresponding models for truly flexible-rate learned video coding. Experimental results demonstrate state-of-the-art rate-distortion performance exceeding those of all prior art in learned video coding.Comment: Accepted for publication in IEEE International Conference on Image Processing (ICIP) 202

    Report of the AHG on MPEG-7 Semantic Information Representation

    Get PDF
    This AHG has been created at the last MPEG meeting in Maui to work in parallel with the CE experiment on the Semantic DS, so as to continue the refinement, both in terms of significance, usage and syntax of the DS’s that have been proposed during the Maui meeting [2] . Following the discussions on the email reflector, the results of a meeting of the US delegation in February and of the discussion during the AHG meeting Mar. 19th, 2000, in Noordwijkerhout, some clarifications were made, though a total convergence has not yet been reached. During the US delagate meeting, an alternative syntax had also been proposed for consideration of the continuation of the CE; it is likely that this will lead to the formulation of competitive solutions to select the best syntax and elementary components of the Semantic DS during the CE process to follow after the 51st MPEG meeting. Below, some listings of the discussions that took place, in reference to the individual mandates of this AHG

    Fast Outlier Rejection by Using Parallax-Based Rigidity Constraint for Epipolar Geometry Estimation

    Full text link
    A novel approach is presented in order to reject correspondence outliers between frames using the parallax-based rigidity constraint for epipolar geometry estimation. In this approach, the invariance of 3-D relative projective structure of a stationary scene over different views is exploited to eliminate outliers, mostly due to independently moving objects of a typical scene. The proposed approach is compared against a well-known RANSAC-based algorithm by the help of a test-bed. The results showed that the speed-up, gained by utilization of the proposed technique as a preprocessing step before RANSAC-based approach, decreases the execution time of the overall outlier rejection, significantly

    3D human action recognition in multiple view scenarios

    Get PDF
    This paper presents a novel view-independent approach to the recognition of human gestures of several people in low resolution sequences from multiple calibrated cameras. In contraposition with other multi-ocular gesture recognition systems based on generating a classification on a fusion of features coming from different views, our system performs a data fusion (3D representation of the scene) and then a feature extraction and classification. Motion descriptors introduced by Bobick et al. for 2D data are extended to 3D and a set of features based on 3D invariant statistical moments are computed. Finally, a Bayesian classifier is employed to perform recognition over a small set of actions. Results are provided showing the effectiveness of the proposed algorithm in a SmartRoom scenario.Peer ReviewedPostprint (published version
    corecore