84 research outputs found

    Pose-invariant, model-based object recognition, using linear combination of views and Bayesian statistics

    Get PDF
    This thesis presents an in-depth study on the problem of object recognition, and in particular the detection of 3-D objects in 2-D intensity images which may be viewed from a variety of angles. A solution to this problem remains elusive to this day, since it involves dealing with variations in geometry, photometry and viewing angle, noise, occlusions and incomplete data. This work restricts its scope to a particular kind of extrinsic variation; variation of the image due to changes in the viewpoint from which the object is seen. A technique is proposed and developed to address this problem, which falls into the category of view-based approaches, that is, a method in which an object is represented as a collection of a small number of 2-D views, as opposed to a generation of a full 3-D model. This technique is based on the theoretical observation that the geometry of the set of possible images of an object undergoing 3-D rigid transformations and scaling may, under most imaging conditions, be represented by a linear combination of a small number of 2-D views of that object. It is therefore possible to synthesise a novel image of an object given at least two existing and dissimilar views of the object, and a set of linear coefficients that determine how these views are to be combined in order to synthesise the new image. The method works in conjunction with a powerful optimization algorithm, to search and recover the optimal linear combination coefficients that will synthesize a novel image, which is as similar as possible to the target, scene view. If the similarity between the synthesized and the target images is above some threshold, then an object is determined to be present in the scene and its location and pose are defined, in part, by the coefficients. The key benefits of using this technique is that because it works directly with pixel values, it avoids the need for problematic, low-level feature extraction and solution of the correspondence problem. As a result, a linear combination of views (LCV) model is easy to construct and use, since it only requires a small number of stored, 2-D views of the object in question, and the selection of a few landmark points on the object, the process which is easily carried out during the offline, model building stage. In addition, this method is general enough to be applied across a variety of recognition problems and different types of objects. The development and application of this method is initially explored looking at two-dimensional problems, and then extending the same principles to 3-D. Additionally, the method is evaluated across synthetic and real-image datasets, containing variations in the objects’ identity and pose. Future work on possible extensions to incorporate a foreground/background model and lighting variations of the pixels are examined

    Study of Video Assisted BSS for Convolutive Mixtures

    Get PDF
    In this paper we present an overview of recent research in the area of audio-visual blind source separation (BSS), together with new results of our work that highlight the advantage of including visual information into a BSS algorithm. In our work the visual information is combined with audio information to form joint audio-visual feature vectors. The audio-visual coherence is then modelled using statistical models. The outputs of these models are used within a frequency domain BSS algorithm to control the step size. Experimental results verify the improvement of the audio-visual method compared to audio only BSS. We also discuss visual feature extraction techniques, along with several recently published methods for audio-visual BSS, and conclude with suggestions for future research

    Expressive Modulation of Neutral Visual Speech

    Get PDF
    The need for animated graphical models of the human face is commonplace in the movies, video games and television industries, appearing in everything from low budget advertisements and free mobile apps, to Hollywood blockbusters costing hundreds of millions of dollars. Generative statistical models of animation attempt to address some of the drawbacks of industry standard practices such as labour intensity and creative inflexibility. This work describes one such method for transforming speech animation curves between different expressive styles. Beginning with the assumption that expressive speech animation is a mix of two components, a high-frequency speech component (the content) and a much lower-frequency expressive component (the style), we use Independent Component Analysis (ICA) to identify and manipulate these components independently of one another. Next we learn how the energy for different speaking styles is distributed in terms of the low-dimensional independent components model. Transforming the speaking style involves projecting new animation curves into the lowdimensional ICA space, redistributing the energy in the independent components, and finally reconstructing the animation curves by inverting the projection. We show that a single ICA model can be used for separating multiple expressive styles into their component parts. Subjective evaluations show that viewers can reliably identify the expressive style generated using our approach, and that they have difficulty in identifying transformed animated expressive speech from the equivalent ground-truth

    Editing faces in videos

    Get PDF
    Editing faces in movies is of interest in the special effects industry. We aim at producing effects such as the addition of accessories interacting correctly with the face or replacing the face of a stuntman with the face of the main actor. The system introduced in this thesis is based on a 3D generative face model. Using a 3D model makes it possible to edit the face in the semantic space of pose, expression, and identity instead of pixel space, and due to its 3D nature allows a modelling of the light interaction. In our system we first reconstruct the 3D face, which is deforming because of expressions and speech, the lighting, and the camera in all frames of a monocular input video. The face is then edited by substituting expressions or identities with those of another video sequence or by adding virtual objects into the scene. The manipulated 3D scene is rendered back into the original video, correctly simulating the interaction of the light with the deformed face and virtual objects. We describe all steps necessary to build and apply the system. This includes registration of training faces to learn a generative face model, semi-automatic annotation of the input video, fitting of the face model to the input video, editing of the fit, and rendering of the resulting scene. While describing the application we introduce a host of new methods, each of which is of interest on its own. We start with a new method to register 3D face scans to use as training data for the face model. For video preprocessing a new interest point tracking and 2D Active Appearance Model fitting technique is proposed. For robust fitting we introduce background modelling, model-based stereo techniques, and a more accurate light model

    Face De-Identification for Privacy Protection

    Get PDF
    The ability to record, store and analyse images of faces economically, rapidly and on a vast scale brings people’s attention to privacy. The current privacy protection approaches for face images are mainly through masking, blurring or black-out which, however, removes data utilities along with the identifying information. As a result, these ad hoc methods are hardly used for data publishing or in further researches. The technique of de-identification attempts to remove identifying information from a dataset while preserving the data utility as much as possible. The research on de-identify structured data has been established while it remains a challenge to de-identify unstructured data such as face data in images and videos. The k-Same face de-identification was the first method that attempted to use an established de-identification theory, k-anonymity, to de-identify a face image dataset. The k-Same face de-identification is also the starting point of this thesis. Re-identification risk and data utility are two incompatible aspects in face de-identification. The focus of this thesis is to improve the privacy protection performance of a face de-identification system while providing data utility preserving solutions for different application scenarios. This thesis first proposes the k-Same furthest face de-identification method which introduces the wrong-map protection to the k-Same-M face de-identification, where the identity loss is maximised by replacing an original face with the face that has the least similarity to it. The data utility of face images has been considered from two aspects in this thesis, the dataset-wise data utility such as data distribution of the data set and the individual-wise data utility such as the facial expression in an individual image. With the aim to preserve the diversity of a face image dataset, the k-Diff-furthest face de-identification method is proposed, which extends the k-Same-furthest method and can provide the wrong-map protection. With respect to the data utility of an individual face image, the visual quality and the preservation of facial expression are discussed in this thesis. A method to merge the isolated de-identified face region and its original image background is presented. The described method can increase the visual quality of a de-identified face image in terms of fidelity and intelligibility. A novel solution to preserving facial expressions in de-identified face images is presented, which can preserve not only the category of facial expressions but also the intensity of face Action Units. Finally, an integration of the Active Appearance Model (AAM) and Generative Adversarial Network (GAN) is presented, which achieves the synthesis of realistic face images with shallow neural network architectures

    Infrared face recognition: a comprehensive review of methodologies and databases

    Full text link
    Automatic face recognition is an area with immense practical potential which includes a wide range of commercial and law enforcement applications. Hence it is unsurprising that it continues to be one of the most active research areas of computer vision. Even after over three decades of intense research, the state-of-the-art in face recognition continues to improve, benefitting from advances in a range of different research fields such as image processing, pattern recognition, computer graphics, and physiology. Systems based on visible spectrum images, the most researched face recognition modality, have reached a significant level of maturity with some practical success. However, they continue to face challenges in the presence of illumination, pose and expression changes, as well as facial disguises, all of which can significantly decrease recognition accuracy. Amongst various approaches which have been proposed in an attempt to overcome these limitations, the use of infrared (IR) imaging has emerged as a particularly promising research direction. This paper presents a comprehensive and timely review of the literature on this subject. Our key contributions are: (i) a summary of the inherent properties of infrared imaging which makes this modality promising in the context of face recognition, (ii) a systematic review of the most influential approaches, with a focus on emerging common trends as well as key differences between alternative methodologies, (iii) a description of the main databases of infrared facial images available to the researcher, and lastly (iv) a discussion of the most promising avenues for future research.Comment: Pattern Recognition, 2014. arXiv admin note: substantial text overlap with arXiv:1306.160

    Face recognition using infrared vision

    Get PDF
    Au cours de la derniĂšre dĂ©cennie, la reconnaissance de visage basĂ©e sur l’imagerie infrarouge (IR) et en particulier la thermographie IR est devenue une alternative prometteuse aux approches conventionnelles utilisant l’imagerie dans le spectre visible. En effet l’imagerie (visible et infrarouge) trouvent encore des contraintes Ă  leur application efficace dans le monde rĂ©el. Bien qu’insensibles Ă  toute variation d’illumination dans le spectre visible, les images IR sont caractĂ©risĂ©es par des dĂ©fis spĂ©cifiques qui leur sont propres, notamment la sensibilitĂ© aux facteurs qui affectent le rayonnement thermique du visage tels que l’état Ă©motionnel, la tempĂ©rature ambiante, la consommation d’alcool, etc. En outre, il est plus laborieux de corriger l’expression du visage et les changements de poses dans les images IR puisque leur contenu est moins riche aux hautes frĂ©quences spatiales ce qui reprĂ©sente en fait une indication importante pour le calage de tout modĂšle dĂ©formable. Dans cette thĂšse, nous dĂ©crivons une nouvelle mĂ©thode qui rĂ©pond Ă  ces dĂ©fis majeurs. ConcrĂštement, pour remĂ©dier aux changements dans les poses et expressions du visage, nous gĂ©nĂ©rons une image synthĂ©tique frontale du visage qui est canonique et neutre vis-Ă -vis de toute expression faciale Ă  partir d’une image du visage de pose et expression faciale arbitraires. Ceci est rĂ©alisĂ© par l’application d’une dĂ©formation affine par morceaux prĂ©cĂ©dĂ©e par un calage via un modĂšle d’apparence active (AAM). Ainsi, une de nos publications est la premiĂšre publication qui explore l’utilisation d’un AAM sur les images IR thermiques ; nous y proposons une Ă©tape de prĂ©traitement qui rehausse la nettetĂ© des images thermiques, ce qui rend la convergence de l’AAM rapide et plus prĂ©cise. Pour surmonter le problĂšme des images IR thermiques par rapport au motif exact du rayonnement thermique du visage, nous le dĂ©crivons celui-ci par une reprĂ©sentation s’appuyant sur des caractĂ©ristiques anatomiques fiables. Contrairement aux approches existantes, notre reprĂ©sentation n’est pas binaire ; elle met plutĂŽt l’accent sur la fiabilitĂ© des caractĂ©ristiques extraites. Cela rend la reprĂ©sentation proposĂ©e beaucoup plus robuste Ă  la fois Ă  la pose et aux changements possibles de tempĂ©rature. L’efficacitĂ© de l’approche proposĂ©e est dĂ©montrĂ©e sur la plus grande base de donnĂ©es publique des vidĂ©os IR thermiques des visages. Sur cette base d’images, notre mĂ©thode atteint des performances de reconnaissance assez bonnes et surpasse de maniĂšre significative les mĂ©thodes dĂ©crites prĂ©cĂ©demment dans la littĂ©rature. L’approche proposĂ©e a Ă©galement montrĂ© de trĂšs bonnes performances sur des sous-ensembles de cette base de donnĂ©es que nous avons montĂ©e nous-mĂȘmes au sein de notre laboratoire. A notre connaissance, il s’agit de l’une des bases de donnĂ©es les plus importantes disponibles Ă  l’heure actuelle tout en prĂ©sentant certains dĂ©fis.Over the course of the last decade, infrared (IR) and particularly thermal IR imaging based face recognition has emerged as a promising complement to conventional, visible spectrum based approaches which continue to struggle when applied in the real world. While inherently insensitive to visible spectrum illumination changes, IR images introduce specific challenges of their own, most notably sensitivity to factors which affect facial heat emission patterns, e.g., emotional state, ambient temperature, etc. In addition, facial expression and pose changes are more difficult to correct in IR images because they are less rich in high frequency details which is an important cue for fitting any deformable model. In this thesis we describe a novel method which addresses these major challenges. Specifically, to normalize for pose and facial expression changes we generate a synthetic frontal image of a face in a canonical, neutral facial expression from an image of the face in an arbitrary pose and facial expression. This is achieved by piecewise affine warping which follows active appearance model (AAM) fitting. This is the first work which explores the use of an AAM on thermal IR images; we propose a pre-processing step which enhances details in thermal images, making AAM convergence faster and more accurate. To overcome the problem of thermal IR image sensitivity to the exact pattern of facial temperature emissions we describe a representation based on reliable anatomical features. In contrast to previous approaches, our representation is not binary; rather, our method accounts for the reliability of the extracted features. This makes the proposed representation much more robust both to pose and scale changes. The effectiveness of the proposed approach is demonstrated on the largest public database of thermal IR images of faces on which it achieves satisfying recognition performance and significantly outperforms previously described methods. The proposed approach has also demonstrated satisfying performance on subsets of the largest video database of the world gathered in our laboratory which will be publicly available free of charge in future. The reader should note that due to the very nature of the feature extraction method in our system (i.e., anatomical based nature of it), we anticipate high robustness of our system to some challenging factors such as the temperature changes. However, we were not able to investigate this in depth due to the limits which exist in gathering realistic databases. Gathering the largest video database considering some challenging factors is one of the other contributions of this research

    Exploiting the bimodality of speech in the cocktail party problem

    Get PDF
    The cocktail party problem is one of following a conversation in a crowded room where there are many competing sound sources, such as the voices of other speakers or music. To address this problem using computers, digital signal processing solutions commonly use blind source separation (BSS) which aims to separate all the original sources (voices) from the mixture simultaneously. Traditionally, BSS methods have relied on information derived from the mixture of sources to separate the mixture into its constituent elements. However, the human auditory system is well adapted to handle the cocktail party scenario, using both auditory and visual information to follow (or hold) a conversation in a such an environment. This thesis focuses on using visual information of the speakers in a cocktail party like scenario to aid in improving the performance of BSS. There are several useful applications of such technology, for example: a pre-processing step for a speech recognition system, teleconferencing or security surveillance. The visual information used in this thesis is derived from the speaker's mouth region, as it is the most visible component of speech production. Initial research presented in this thesis considers a joint statistical model of audio and visual features, which is used to assist in control ling the convergence behaviour of a BSS algorithm. The results of using the statistical models are compared to using the raw audio information alone and it is shown that the inclusion of visual information greatly improves its convergence behaviour. Further research focuses on using the speaker's mouth region to identify periods of time when the speaker is silent through the development of a visual voice activity detector (V-VAD) (i.e. voice activity detection using visual information alone). This information can be used in many different ways to simplify the BSS process. To this end, two novel V-VADs were developed and tested within a BSS framework, which result in significantly improved intelligibility of the separated source associated with the V-VAD output. Thus the research presented in this thesis confirms the viability of using visual information to improve solutions to the cocktail party problem.EThOS - Electronic Theses Online ServiceGBUnited Kingdo
    • 

    corecore