13 research outputs found

    Model-aided coding: a new approach to incorporate facial animation into motion-compensated video coding

    Full text link

    3D Motion Estimation of Human Head by Using Optical Flow

    Get PDF
    The paper deals with the new algorithm of estimation of large 3D motion of the human head by using the optical flow and the model Candide. In the algorithm prediction of 3D motion parameters in a feedback loop and with multiple iterations was applied. The prediction of 3D motion parameters does not require creating of the synthesized frames but directly uses the frames of input videosequence. Next the algorithm does not need extracting of feature points inside the frames because they are given by the vertices of the used calibrated model Candide. As achieved experimental results show, the iteration process in prediction of 3D motion parameters increased the accuracy of estimation above all the large 3D motion. Such a way the estimation error is decreased without its accumulation in long videosequence. Finally the experimental results show that for 3 iterations a state of saturation was achieved what means that by next increasing of the number of iterations practically no significant increasing of the accuracy of estimation of 3D motion parameters is occurred

    Video coding for compression and content-based functionality

    Get PDF
    The lifetime of this research project has seen two dramatic developments in the area of digital video coding. The first has been the progress of compression research leading to a factor of two improvement over existing standards, much wider deployment possibilities and the development of the new international ITU-T Recommendation H.263. The second has been a radical change in the approach to video content production with the introduction of the content-based coding concept and the addition of scene composition information to the encoded bit-stream. Content-based coding is central to the latest international standards efforts from the ISO/IEC MPEG working group. This thesis reports on extensions to existing compression techniques exploiting a priori knowledge about scene content. Existing, standardised, block-based compression coding techniques were extended with work on arithmetic entropy coding and intra-block prediction. These both form part of the H.263 and MPEG-4 specifications respectively. Object-based coding techniques were developed within a collaborative simulation model, known as SIMOC, then extended with ideas on grid motion vector modelling and vector accuracy confidence estimation. An improved confidence measure for encouraging motion smoothness is proposed. Object-based coding ideas, with those from other model and layer-based coding approaches, influenced the development of content-based coding within MPEG-4. This standard made considerable progress in this newly adopted content based video coding field defining normative techniques for arbitrary shape and texture coding. The means to generate this information, the analysis problem, for the content to be coded was intentionally not specified. Further research work in this area concentrated on video segmentation and analysis techniques to exploit the benefits of content based coding for generic frame based video. The work reported here introduces the use of a clustering algorithm on raw data features for providing initial segmentation of video data and subsequent tracking of those image regions through video sequences. Collaborative video analysis frameworks from COST 21 l qual and MPEG-4, combining results from many other segmentation schemes, are also introduced

    Recognition of facial action units from video streams with recurrent neural networks : a new paradigm for facial expression recognition

    Get PDF
    Philosophiae Doctor - PhDThis research investigated the application of recurrent neural networks (RNNs) for recognition of facial expressions based on facial action coding system (FACS). Support vector machines (SVMs) were used to validate the results obtained by RNNs. In this approach, instead of recognizing whole facial expressions, the focus was on the recognition of action units (AUs) that are defined in FACS. Recurrent neural networks are capable of gaining knowledge from temporal data while SVMs, which are time invariant, are known to be very good classifiers. Thus, the research consists of four important components: comparison of the use of image sequences against single static images, benchmarking feature selection and network optimization approaches, study of inter-AU correlations by implementing multiple output RNNs, and study of difference images as an approach for performance improvement. In the comparative studies, image sequences were classified using a combination of Gabor filters and RNNs, while single static images were classified using Gabor filters and SVMs. Sets of 11 FACS AUs were classified by both approaches, where a single RNN/SVM classifier was used for classifying each AU. Results indicated that classifying FACS AUs using image sequences yielded better results than using static images. The average recognition rate (RR) and false alarm rate (FAR) using image sequences was 82.75% and 7.61%, respectively, while the classification using single static images yielded a RR and FAR of 79.47% and 9.22%, respectively. The better performance by the use of image sequences can be at- tributed to RNNs ability, as stated above, to extract knowledge from time-series data. Subsequent research then investigated benchmarking dimensionality reduction, feature selection and network optimization techniques, in order to improve the performance provided by the use of image sequences. Results showed that an optimized network, using weight decay, gave best RR and FAR of 85.38% and 6.24%, respectively. The next study was of the inter-AU correlations existing in the Cohn-Kanade database and their effect on classification models. To accomplish this, a model was developed for the classification of a set of AUs by a single multiple output RNN. Results indicated that high inter-AU correlations do in fact aid classification models to gain more knowledge and, thus, perform better. However, this was limited to AUs that start and reach apex at almost the same time. This suggests the need for availability of a larger database of AUs, which could provide both individual and AU combinations for further investigation. The final part of this research investigated use of difference images to track the motion of image pixels. Difference images provide both noise and feature reduction, an aspect that was studied. Results showed that the use of difference image sequences provided the best results, with RR and FAR of 87.95% and 3.45%, respectively, which is shown to be significant when compared to use of normal image sequences classified using RNNs. In conclusion, the research demonstrates that use of RNNs for classification of image sequences is a new and improved paradigm for facial expression recognition

    Supporting real time video over ATM networks

    Get PDF
    Includes bibliographical references.In this project, we propose and evaluate an approach to delimit and tag such independent video slice at the ATM layer for early discard. This involves the use of a tag cell differentiated from the rest of the data by its PTI value and a modified tag switch to facilitate the selective discarding of affected cells within each video slice as opposed to dropping of cells at random from multiple video frames

    MPEG-4's BIFS-Anim protocol: using MPEG-4 for streaming of 3D animations

    Get PDF
    This thesis explores issues related to the generation and animation of synthetic objects within the context of MPEG-4. MPEG-4 was designed to provide a standard that will deliver rich multimedia content on many different platforms and networks. MPEG-4 should be viewed as a toolbox rather than as a monolithic standard as each implementer of the standard will pick the necessary tools adequate to their needs, likely to be a small subset of the available tools. The subset of MPEG-4 that will be examined here are the tools relating to the generation of 3D scenes and to the animation of those scenes. A comparison with the most popular 3D standard, Virtual Reality Modeling Language (VRML) will be included. An overview of the MPEG-4 standard will be given, describing the basic concepts. MPEG-4 uses a scene description language called Binary Format for Scene (BIFS) for the composition of scenes, this description language will be described. The potential for the technology used in BIFS to provide low bitrate streaming 3D animations will be analysed and some examples of the possible uses of this technology will be given. A tool for the encoding of streaming 3D animations will be described and results will be shown that MPEG-4 provides a more efficient way of encoding 3D data when compared to VRML. Finally a look will be taken at the future of 3D content on the Internet

    Proceedings of the 5th international conference on disability, virtual reality and associated technologies (ICDVRAT 2004)

    Get PDF
    The proceedings of the conferenc

    Marker-free human motion capture in dynamic cluttered environments from a single view-point

    Get PDF
    Human Motion Capture is a widely used technique to obtain motion data for animation of virtual characters. Commercial optical motion capture systems are marker-based. This thesis is about marker-free motion capture. The pose and motion estimation of an observed person is carried out in an optimization framework for articulated objects. The motion function is formulated with kinematic chains consisting of rotations around arbitrary axes in 3D space. This formulation leads to a Nonlinear Least Squares problem, which is solved with gradient-based methods. With the formulation in this thesis the necessary derivatives can be derived analytically. This speeds up processing and increases accuracy. Different gradient based methods are compared to solve the Nonlinear Least Squares problem, which allows the integration of second order motion derivatives as well. The pose estimation requires correspondences between known model of the person and observed data. To obtain this model, a new method is developed, which fits a template model to a specific person from 6 posture images taken by a single camera. Various types of correspondences are integrated in the optimization simultaneously without making approximations to the motion or optimization function, namely 3D-3D correspondences from stereo algorithms and 3D-2D correspondences from image silhouettes and 2D point tracking. Of major importance for the developed methods is the processing time and robustness to cluttered and dynamic background. Experiments show, that complex motion with 24 degrees of freedom is track-able from a single stereo view until body parts get totally occluded. Further methods are developed to estimate pose from a single camera view with cluttered dynamic background. Similar to other work on 2D-3D pose estimation, correspondences between model and image silhouette of the person are established by analyzing the gray value gradient near the predicted model silhouette. To increase the accuracy of silhouette correspondences, color histograms for each body part are combined with image gradient search. The combination of 3D depth data and 2D image data is tested with depth data from a PMD camera (Photonic Mixer Device), which measures the depth to scene points by the time of flight of ligh

    Metadata assisted image segmentation

    Get PDF
    Tese de doutoramento. Engenharia Electrotécnica e de Computadores. 2006. Faculdade de Engenharia. Universidade do Port
    corecore