11 research outputs found

    A system for advanced facial animation

    Get PDF
    Thesis (M. Eng.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1996.Includes bibliographical references (leaves 35-36).by Kenneth D. Miller, III.M.Eng

    Hand gesture recognition, prediction, and coding using hidden Markov models

    Get PDF
    Thesis (M. Eng.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1996.Includes bibliographical references (leaves 56-57).by Katerina H. Nguyen.M.Eng

    Image recognition using the Eigenpicture Technique (with specific applications in face recognition and optical character recognition)

    Get PDF
    Includes bibliographical references.In the first part of this dissertation, we present a detailed description of the eigenface technique first proposed by Sirovich and Kirby and subsequently developed by several groups, most notably the Media Lab at MIT. Other significant contributions have been made by Rockefeller University, whose ideas have culminated in a commercial system known as Faceit. For a different techniques (i.e. not eigenfaces) and a detailed comparison of some other techniques, the reader is referred to [5]. Although we followed ideas in the open literature (we believe there that there is a large body of advanced proprietary knowledge, which remains inaccessible), the implementation is our own. In addition, we believe that the method for updating the eigenfaces to deal with badly represented images presented in section 2. 7 is our own. The next stage in this section would be to develop an experimental system that can be extensively tested. At this point however, another, nonscientific difficulty arises, that of developing an adequately large data base. The basic problem is that one needs a training set representative of all faces to be encountered in future. Note that this does not mean that one can only deal with faces in the database, the whole idea is to be able to work with any facial image. However, a data base is only representative if it contains images similar to anything that can be encountered in future. For this reason a representative database may be very large and is not easy to build. In addition for testing purposes one needs multiple images of a large number of people, acquired over a period of time under different physical conditions representing the typical variations encountered in practice. Obviously this is a very slow process. Potentially the variation between the faces in the database can be large suggesting that the representation of all these different images in terms of eigenfaces may not be particularly efficient. One idea is to separate all the facial images into different, more or less homogeneous classes. Again this can only be done with access to a sufficiently large database, probably consisting of several thousand faces

    Fitting and tracking of a scene model in very low bit rate video coding

    Get PDF

    Facial feature processing using artificial neural networks

    Get PDF
    Describing a human face is a natural ability used in eveyday life. To the police, a witness description of a suspect is key evidence in the identification of the suspect. However, the process of examining "mug shots" to find a match to the description is tedious and often unfruitful. If a description could be stored with each photograph and used as a searchable index, this would provide a much more effective means of using "mug shots" for identification purposes. A set of descriptive measures have been defined by Shepherd [73] which seek to describe faces in a manner that may be used for just this purpose. This work investigates methods of automatically determining these descriptive measures from digitised images. Analysis is performed on the images to establish the potential for distinguishing between different categories in these descriptions. This reveals that while some of the classifications are relatively linear, others are very non-linear. Artificial neural networks (ANNs), being often used as non-linear classifiers, are considered as a means of automatically performing the classification of the images. As a comparison, simple linear classifiers are also applied to the same problems

    Machine learning techniques in pain recognition.

    Get PDF
    No abstract available.The original print copy of this thesis may be available here: http://wizard.unbc.ca/record=b131711

    Adaptive techniques with polynomial models for segmentation, approximation and analysis of faces in video sequences

    Get PDF

    Situated face detection

    Get PDF
    scholarship 71357In the last twenty years, important advances have been made in the field of automatic face processing, given the importance of human faces for personal identification, emotional expression and verbal and non verbal communication. The very first step in a face processing algorithm is the detection of faces; while this is a trivial problem in controlled environments, the detection of faces in real environments is still a challenging task. Until now, the most successful approaches for face detection represent the face as a grey-level pattern, and the problem itself is considered as the classification between "face" and "non-face" patterns. Satisfactory results have been achieved in this area. The main disadvantage is that an exhaustive search has to be done on each image in order to locate the faces. This search normally involves testing every single position on the image at different scales, and although this does not represent an important drawback in off-line face processing systems, in those cases where a real-time response is needed it is still a problem. In the different proposed methods for face detection, the "observer" is a disembodied entity, which holds no relationship with the observed scene. This thesis presents a framework for an efficient location of faces in real scenes, in which, by considering both the observer to be situated in the world, and the relationships that hold between the two, a set of constraints in the search space can be defined. The constraints rely on two main assumptions; first, the observer can purposively interact with the world (i.e. change its position relative to the observed scene) and second, the camera is fully calibrated. The first source constraint is the structural information about the observer environment, represented as a depth map of the scene in front of the camera. From this representation the search space can be constrained in terms of the range of scales where a face might be found as different positions in the image. The second source of constraint is the geometrical relationship between the camera and the scene, which allows us to project a model of the subject into the scene in order to eliminate those areas where faces are unlikely to be found. In order to test the proposed framework, a system based on the premises stated above was constructed. It is based on three different modules: a face/non-face classifier, a depth estimation module and a search module. The classifier is composed of a set of convolutional neural networks (CNN) that were trained to differentiate between face and non-face patterns, the depth estimation modules uses a multilevel algorithm to compute the scene depth map from a sequence of images captured the depth information and the subject model into the image where the search will be performed in order to constrain the search space. Finally, the proposed system was validated by running a set of experiments on the individual modules and then on the whole system

    Development of the components of a low cost, distributed facial virtual conferencing system

    Get PDF
    This thesis investigates the development of a low cost, component based facial virtual conferencing system. The design is decomposed into an encoding phase and a decoding phase, which communicate with each other via a network connection. The encoding phase is composed of three components: model acquisition (which handles avatar generation), pose estimation and expression analysis. Audio is not considered part of the encoding and decoding process, and as such is not evaluated. The model acquisition component is implemented using a visual hull reconstruction algorithm that is able to reconstruct real-world objects using only sets of images of the object as input. The object to be reconstructed is assumed to lie in a bounding volume of voxels. The reconstruction process involves the following stages: - Space carving for basic shape extraction; - Isosurface extraction to remove voxels not part of the surface of the reconstruction; - Mesh connection to generate a closed, connected polyhedral mesh; - Texture generation. Texturing is achieved by Gouraud shading the reconstruction with a vertex colour map; - Mesh decimation to simplify the object. The original algorithm has complexity O(n), but suffers from an inability to reconstruct concave surfaces that do not form part of the visual hull of the object. A novel extension to this algorithm based on Normalised Cross Correlation (NCC) is proposed to overcome this problem. An extension to speed up traditional NCC evaluations is proposed which reduces the NCC search space from a 2D search problem down to a single evaluation. Pose estimation and expression analysis are performed by tracking six fiducial points on the face of a subject. A tracking algorithm is developed that uses Normalised Cross Correlation to facilitate robust tracking that is invariant to changing lighting conditions, rotations and scaling. Pose estimation involves the recovery of the head position and orientation through the tracking of the triangle formed by the subject's eyebrows and nose tip. A rule-based evaluation of points that are tracked around the subject's mouth forms the basis of the expression analysis. A user assisted feedback loop and caching mechanism is used to overcome tracking errors due to fast motion or occlusions. The NCC tracker is shown to achieve a tracking performance of 10 fps when tracking the six fiducial points. The decoding phase is divided into 3 tasks, namely: avatar movement, expression generation and expression management. Avatar movement is implemented using the base VR system. Expression generation is facilitated using a Vertex Interpolation Deformation method. A weighting system is proposed for expression management. Its function is to gradually transform from one expression to the next. The use of the vertex interpolation method allows real-time deformations of the avatar representation, achieving 16 fps when applied to a model consisting of 7500 vertices. An Expression Parameter Lookup Table (EPLT) facilitates an independent mapping between the two phases. It defines a list of generic expressions that are known to the system and associates an Expression ID with each one. For each generic expression, it relates the expression analysis rules for any subject with the expression generation parameters for any avatar model. The result is that facial expression replication between any subject and avatar combination can be performed by transferring only the Expression ID from the encoder application to the decoder application. The ideas developed in the thesis are demonstrated in an implementation using the CoRgi Virtual Reality system. It is shown that the virtual-conferencing application based on this design requires only a bandwidth of 2 Kbps.Adobe Acrobat Pro 9.4.6Adobe Acrobat 9.46 Paper Capture Plug-i
    corecore