104 research outputs found

    Joint optimization of manifold learning and sparse representations for face and gesture analysis

    Get PDF
    Face and gesture understanding algorithms are powerful enablers in intelligent vision systems for surveillance, security, entertainment, and smart spaces. In the future, complex networks of sensors and cameras may disperse directions to lost tourists, perform directory lookups in the office lobby, or contact the proper authorities in case of an emergency. To be effective, these systems will need to embrace human subtleties while interacting with people in their natural conditions. Computer vision and machine learning techniques have recently become adept at solving face and gesture tasks using posed datasets in controlled conditions. However, spontaneous human behavior under unconstrained conditions, or in the wild, is more complex and is subject to considerable variability from one person to the next. Uncontrolled conditions such as lighting, resolution, noise, occlusions, pose, and temporal variations complicate the matter further. This thesis advances the field of face and gesture analysis by introducing a new machine learning framework based upon dimensionality reduction and sparse representations that is shown to be robust in posed as well as natural conditions. Dimensionality reduction methods take complex objects, such as facial images, and attempt to learn lower dimensional representations embedded in the higher dimensional data. These alternate feature spaces are computationally more efficient and often more discriminative. The performance of various dimensionality reduction methods on geometric and appearance based facial attributes are studied leading to robust facial pose and expression recognition models. The parsimonious nature of sparse representations (SR) has successfully been exploited for the development of highly accurate classifiers for various applications. Despite the successes of SR techniques, large dictionaries and high dimensional data can make these classifiers computationally demanding. Further, sparse classifiers are subject to the adverse effects of a phenomenon known as coefficient contamination, where for example variations in pose may affect identity and expression recognition. This thesis analyzes the interaction between dimensionality reduction and sparse representations to present a unified sparse representation classification framework that addresses both issues of computational complexity and coefficient contamination. Semi-supervised dimensionality reduction is shown to mitigate the coefficient contamination problems associated with SR classifiers. The combination of semi-supervised dimensionality reduction with SR systems forms the cornerstone for a new face and gesture framework called Manifold based Sparse Representations (MSR). MSR is shown to deliver state-of-the-art facial understanding capabilities. To demonstrate the applicability of MSR to new domains, MSR is expanded to include temporal dynamics. The joint optimization of dimensionality reduction and SRs for classification purposes is a relatively new field. The combination of both concepts into a single objective function produce a relation that is neither convex, nor directly solvable. This thesis studies this problem to introduce a new jointly optimized framework. This framework, termed LGE-KSVD, utilizes variants of Linear extension of Graph Embedding (LGE) along with modified K-SVD dictionary learning to jointly learn the dimensionality reduction matrix, sparse representation dictionary, sparse coefficients, and sparsity-based classifier. By injecting LGE concepts directly into the K-SVD learning procedure, this research removes the support constraints K-SVD imparts on dictionary element discovery. Results are shown for facial recognition, facial expression recognition, human activity analysis, and with the addition of a concept called active difference signatures, delivers robust gesture recognition from Kinect or similar depth cameras

    Face recognition using infrared vision

    Get PDF
    Au cours de la dernière décennie, la reconnaissance de visage basée sur l’imagerie infrarouge (IR) et en particulier la thermographie IR est devenue une alternative prometteuse aux approches conventionnelles utilisant l’imagerie dans le spectre visible. En effet l’imagerie (visible et infrarouge) trouvent encore des contraintes à leur application efficace dans le monde réel. Bien qu’insensibles à toute variation d’illumination dans le spectre visible, les images IR sont caractérisées par des défis spécifiques qui leur sont propres, notamment la sensibilité aux facteurs qui affectent le rayonnement thermique du visage tels que l’état émotionnel, la température ambiante, la consommation d’alcool, etc. En outre, il est plus laborieux de corriger l’expression du visage et les changements de poses dans les images IR puisque leur contenu est moins riche aux hautes fréquences spatiales ce qui représente en fait une indication importante pour le calage de tout modèle déformable. Dans cette thèse, nous décrivons une nouvelle méthode qui répond à ces défis majeurs. Concrètement, pour remédier aux changements dans les poses et expressions du visage, nous générons une image synthétique frontale du visage qui est canonique et neutre vis-à-vis de toute expression faciale à partir d’une image du visage de pose et expression faciale arbitraires. Ceci est réalisé par l’application d’une déformation affine par morceaux précédée par un calage via un modèle d’apparence active (AAM). Ainsi, une de nos publications est la première publication qui explore l’utilisation d’un AAM sur les images IR thermiques ; nous y proposons une étape de prétraitement qui rehausse la netteté des images thermiques, ce qui rend la convergence de l’AAM rapide et plus précise. Pour surmonter le problème des images IR thermiques par rapport au motif exact du rayonnement thermique du visage, nous le décrivons celui-ci par une représentation s’appuyant sur des caractéristiques anatomiques fiables. Contrairement aux approches existantes, notre représentation n’est pas binaire ; elle met plutôt l’accent sur la fiabilité des caractéristiques extraites. Cela rend la représentation proposée beaucoup plus robuste à la fois à la pose et aux changements possibles de température. L’efficacité de l’approche proposée est démontrée sur la plus grande base de données publique des vidéos IR thermiques des visages. Sur cette base d’images, notre méthode atteint des performances de reconnaissance assez bonnes et surpasse de manière significative les méthodes décrites précédemment dans la littérature. L’approche proposée a également montré de très bonnes performances sur des sous-ensembles de cette base de données que nous avons montée nous-mêmes au sein de notre laboratoire. A notre connaissance, il s’agit de l’une des bases de données les plus importantes disponibles à l’heure actuelle tout en présentant certains défis.Over the course of the last decade, infrared (IR) and particularly thermal IR imaging based face recognition has emerged as a promising complement to conventional, visible spectrum based approaches which continue to struggle when applied in the real world. While inherently insensitive to visible spectrum illumination changes, IR images introduce specific challenges of their own, most notably sensitivity to factors which affect facial heat emission patterns, e.g., emotional state, ambient temperature, etc. In addition, facial expression and pose changes are more difficult to correct in IR images because they are less rich in high frequency details which is an important cue for fitting any deformable model. In this thesis we describe a novel method which addresses these major challenges. Specifically, to normalize for pose and facial expression changes we generate a synthetic frontal image of a face in a canonical, neutral facial expression from an image of the face in an arbitrary pose and facial expression. This is achieved by piecewise affine warping which follows active appearance model (AAM) fitting. This is the first work which explores the use of an AAM on thermal IR images; we propose a pre-processing step which enhances details in thermal images, making AAM convergence faster and more accurate. To overcome the problem of thermal IR image sensitivity to the exact pattern of facial temperature emissions we describe a representation based on reliable anatomical features. In contrast to previous approaches, our representation is not binary; rather, our method accounts for the reliability of the extracted features. This makes the proposed representation much more robust both to pose and scale changes. The effectiveness of the proposed approach is demonstrated on the largest public database of thermal IR images of faces on which it achieves satisfying recognition performance and significantly outperforms previously described methods. The proposed approach has also demonstrated satisfying performance on subsets of the largest video database of the world gathered in our laboratory which will be publicly available free of charge in future. The reader should note that due to the very nature of the feature extraction method in our system (i.e., anatomical based nature of it), we anticipate high robustness of our system to some challenging factors such as the temperature changes. However, we were not able to investigate this in depth due to the limits which exist in gathering realistic databases. Gathering the largest video database considering some challenging factors is one of the other contributions of this research
    • …
    corecore