51 research outputs found
Three-dimensional facial motion and structure estimation in video coding
Ankara : Department of Electrical and Electronics Engineering and the Institute of Engineering and Science of Bilkent Univ., 1994.Thesis (Ph.D.) -- Bilkent University, 1994.Includes bibliographical references leaves 81-89.We propose a novel formulation where 3-D global and local motion estimation and
the adaptation of a generic wire-frame model to a particular speaker are considered
simultaneously within an optical flow based framework including the photometric effects
of the motion. We use a flexible wire-frame model whose local structure is characterized
by the normal vectors of the patches which are related to the coordinates of the nodes.
Geometric constraints that describe the propagation of the movement of the nodes are
introduced, which are then efficiently utilized to reduce the number of independent
structure parameters. A stochastic relaxation algorithm has been used to determine
optimum global motion estimates and the parameters describing the structure of the
wire-frame model. For the initialization of the motion and structure parameters, a
modified feature based algorithm is used whose performance has also been compared
with the existing methods. Results with both simulated and real facial image sequences
are provided.BozdaÄı, GözdePh.D
Video coding for compression and content-based functionality
The lifetime of this research project has seen two dramatic developments in the area of digital video coding. The first has been the progress of compression research leading to a factor of two improvement over existing standards, much wider deployment possibilities and the development of the new international ITU-T Recommendation H.263. The second has been a radical change in the approach to video content production with the introduction of the content-based coding concept and the addition of scene composition information to the encoded bit-stream. Content-based coding is central to the latest international standards efforts from the ISO/IEC MPEG working group.
This thesis reports on extensions to existing compression techniques exploiting a priori knowledge about scene content. Existing, standardised, block-based compression coding techniques were extended with work on arithmetic entropy coding and intra-block prediction. These both form part of the H.263 and MPEG-4 specifications respectively. Object-based coding techniques were developed within a collaborative simulation model, known as SIMOC, then extended with ideas on grid motion vector modelling and vector accuracy confidence estimation. An improved confidence measure for encouraging motion smoothness is proposed.
Object-based coding ideas, with those from other model and layer-based coding approaches, influenced the development of content-based coding within MPEG-4. This standard made considerable progress in this newly adopted content based video coding field defining normative techniques for arbitrary shape and texture coding. The means to generate this information, the analysis problem, for the content to be coded was intentionally not specified. Further research work in this area concentrated on video segmentation and analysis techniques to exploit the benefits of content based coding for generic frame based video. The work reported here introduces the use of a clustering algorithm on raw data features for providing initial segmentation of video data and subsequent tracking of those image regions through video sequences. Collaborative video analysis frameworks from COST 21 l qual and MPEG-4, combining results from many other segmentation schemes, are also introduced
Object-based video representations: shape compression and object segmentation
Object-based video representations are considered to be useful for easing the process of multimedia content production and enhancing user interactivity in multimedia productions. Object-based video presents several new technical challenges, however.
Firstly, as with conventional video representations, compression of the video data is a
requirement. For object-based representations, it is necessary to compress the shape of
each video object as it moves in time. This amounts to the compression of moving
binary images. This is achieved by the use of a technique called context-based
arithmetic encoding. The technique is utilised by applying it to rectangular pixel blocks and as such it is consistent with the standard tools of video compression. The blockbased application also facilitates well the exploitation of temporal redundancy in the sequence of binary shapes. For the first time, context-based arithmetic encoding is used in conjunction with motion compensation to provide inter-frame compression. The method, described in this thesis, has been thoroughly tested throughout the MPEG-4 core experiment process and due to favourable results, it has been adopted as part of the MPEG-4 video standard.
The second challenge lies in the acquisition of the video objects. Under normal conditions, a video sequence is captured as a sequence of frames and there is no inherent information about what objects are in the sequence, not to mention information relating to the shape of each object. Some means for segmenting semantic objects from general video sequences is required. For this purpose, several image analysis tools may be of help and in particular, it is believed that video object tracking algorithms will be important. A new tracking algorithm is developed based on piecewise polynomial motion representations and statistical estimation tools, e.g. the expectationmaximisation method and the minimum description length principle
Recommended from our members
The linguistic repertoire of deaf cuers: an ethnographic query on practice
textTaking an anthropological perspective, this dissertation focuses on a small segment of the American deaf community that uses Cued Speech by examining the nature of the cuers' linguistic repertoire. Multimodality is at issue for this dissertation. It can affect the ways of speaking or more appropriately, ways of communicating (specifically, signing or cueing). Speech and Cued Speech rely on different modalities by using different sets of articulators. Hearing adults do not learn Cued Speech the same way deaf children do. English-speaking, hearing adult learners can base their articulation of Cued Speech on existing knowledge of their spoken language. However, because deaf children do not have natural access to spoken language phonology aurally, they tend to learn Cued Speech communicatively through day-to-day interactions with family members and deaf cueing peers. I am interested in examining the construct of cuers' linguistic repertoire. Which parts of their linguistic repertoire model after signed languages? Which parts of their linguistic repertoire model after spoken languages? Cuers' phonological, syntactal and lexical repertoire largely depends on several factors including social class, geography, and the repertoire of hearing cuers whom they interacted with on a daily basis. For most deaf cuers, hearing cuers including parents, transliterators and educators serve as a model for the English language. Hearing cuers play a role as unwitting gatekeepers for the maintenance of 'proper' cueing among deaf users. For this dissertation, I seek to study the effects of modality on how cuers manage their linguistic repertoire. The statement of the problem is this: Cued Speech is visual and made with the hands like ASL but is ultimately a code for the English language. The research questions to be examined in this dissertation include how cuers adapt an invented system for their purposes, what adjustments they make to Cued Speech, how Cued Speech interacts with gesture, and what language play in Cued Speech looks like.Anthropolog
Three-dimensional representations of video using knowledge based estimation
Thesis (M.S.)--Massachusetts Institute of Technology, Dept. of Architecture, 1991.Includes bibliographical references (leaves 63-66).by Henry Neil Holtzman.M.S
Improved facial feature fitting for model based coding and animation
EThOS - Electronic Theses Online ServiceGBUnited Kingdo
Energy efficient enabling technologies for semantic video processing on mobile devices
Semantic object-based processing will play an increasingly important role in future multimedia systems due to the ubiquity of digital multimedia capture/playback technologies and increasing storage capacity. Although the object based paradigm has many undeniable benefits, numerous technical challenges remain before the applications becomes pervasive, particularly on computational constrained mobile devices. A fundamental issue is the ill-posed problem of semantic object segmentation. Furthermore, on battery powered mobile computing devices, the additional algorithmic complexity of semantic object based processing compared to conventional video processing is highly undesirable both from a real-time operation and battery life perspective. This
thesis attempts to tackle these issues by firstly constraining the solution space and focusing on the
human face as a primary semantic concept of use to users of mobile devices. A novel face detection algorithm is proposed, which from the outset was designed to be amenable to be offloaded from the host microprocessor to dedicated hardware, thereby providing real-time performance and
reducing power consumption. The algorithm uses an Artificial Neural Network (ANN), whose topology and weights are evolved via a genetic algorithm (GA). The computational burden of the ANN evaluation is offloaded to a dedicated hardware accelerator, which is capable of processing
any evolved network topology. Efficient arithmetic circuitry, which leverages modified Booth recoding, column compressors and carry save adders, is adopted throughout the design. To tackle the increased computational costs associated with object tracking or object based shape encoding, a novel energy efficient binary motion estimation architecture is proposed. Energy is reduced in the proposed motion estimation architecture by minimising the redundant operations inherent in the binary data. Both architectures are shown to compare favourable with the relevant prior art
- âŠ