43 research outputs found
Improving Facial Analysis and Performance Driven Animation through Disentangling Identity and Expression
We present techniques for improving performance driven facial animation,
emotion recognition, and facial key-point or landmark prediction using learned
identity invariant representations. Established approaches to these problems
can work well if sufficient examples and labels for a particular identity are
available and factors of variation are highly controlled. However, labeled
examples of facial expressions, emotions and key-points for new individuals are
difficult and costly to obtain. In this paper we improve the ability of
techniques to generalize to new and unseen individuals by explicitly modeling
previously seen variations related to identity and expression. We use a
weakly-supervised approach in which identity labels are used to learn the
different factors of variation linked to identity separately from factors
related to expression. We show how probabilistic modeling of these sources of
variation allows one to learn identity-invariant representations for
expressions which can then be used to identity-normalize various procedures for
facial expression analysis and animation control. We also show how to extend
the widely used techniques of active appearance models and constrained local
models through replacing the underlying point distribution models which are
typically constructed using principal component analysis with
identity-expression factorized representations. We present a wide variety of
experiments in which we consistently improve performance on emotion
recognition, markerless performance-driven facial animation and facial
key-point tracking.Comment: to appear in Image and Vision Computing Journal (IMAVIS
Face Mining in Wikipedia Biographies
RÉSUMÉ
Cette thèse présente quelques contributions à la recherche liées au thème de la création d’un système automatisé pour l’extraction de visages dans les pages de biographie sur Wikipédia. La première contribution majeure de ce travail est l’élaboration d’une solution au problème basé sur une nouvelle technique de modélisation graphique probabiliste. Nous utilisons l’inférence probabiliste pour faire des prédictions structurées dans les modèles construits dynamiquement afin d’identifier les véritables exemples de visages correspondant à l’objet d’une biographie parmi tous les visages détectés. Notre modèle probabiliste prend en considération l’information provenant de différentes
sources, dont : des résultats de comparaisons entre visages détectés, des métadonnées provenant des images de visage et de leurs détections, des images parentes, des données géospatiales, des noms de fichiers et des sous-titres. Nous croyons que cette recherche est également unique parce que nous sommes les premiers à présenter un système complet et une évaluation expérimentale de la tâche de l’extraction des visages humains dans la nature à une échelle de plus de 50 000 identités.
Une autre contribution majeure de nos travaux est le développement d’une nouvelle catégorie de modèles probabilistes discriminatifs basée sur une fonction logistique Beta-Bernoulli généralisée. À travers notre formulation novatrice, nous fournissons une nouvelle méthode d’approximation lisse de la perte 0-1, ainsi qu’une nouvelle catégorie de classificateurs probabilistes. Nous présentons certaines expériences réalisées à l’aide de cette technique pour : 1) une nouvelle forme de régression logistique que nous nommons la régression logistique Beta-Bernoulli généralisée ; 2) une version de cette même technique ; et enfin pour 3) notre modèle pour l’extraction des visages que l’on pourrait considérer comme une technique de prédiction structurée en combinant plusieurs sources multimédias. À travers ces expériences, nous démontrons que les différentes formes de cette nouvelle formulation Beta-Bernoulli améliorent la performance des méthodes de la régression logistique couramment utilisées ainsi que la performance des machines à vecteurs de support (SVM) linéaires et non linéaires dans le but d’une classification binaire. Pour évaluer notre technique, nous avons procédé à des tests de performance reconnus en utilisant différentes propriétés allant de celles qui sont de relativement petite taille à celles qui sont de relativement grande taille, en plus de se baser sur des problèmes ayant des caractéristiques clairsemées ou denses. Notre analyse montre que le modèle Beta-Bernoulli généralisé améliore les formes analogues de modèles classiques de la régression logistique et les machines à vecteurs de support et que lorsque nos
évaluations sont effectuées sur les ensembles de données à plus grande échelle, les résultats sont statistiquement significatifs. Une autre constatation est que l’approche est aussi robuste lorsqu’il s’agit de valeurs aberrantes. De plus, notre modèle d’extraction de visages atteint sa meilleure performance lorsque le sous-composant consistant d’un modèle discriminant d’entropie maximale est remplacé par notre modèle de Beta-Bernoulli généralisée de la régression logistique. Cela montre l’applicabilité générale de notre approche proposée pour une tâche de prédiction structurée. Autant que nous sachions, c’est la première fois qu’une approximation lisse de la perte 0-1 a été utilisée pour la classification structurée.
Enfin, nous avons exploré plus en profondeur un problème important lié à notre tâche d’extraction des visages – la localisation des points-clés denses sur les visages humains. Nous avons développé un pipeline complet qui résout le problème de localisation des points-clés en utilisant une approche par sous-espace localement linéaire. Notre modèle de localisation des points-clés est d’une efficacité comparable à l’état de l’art.----------ABSTRACT
This thesis presents a number of research contributions related to the theme of creating an automated system for extracting faces from Wikipedia biography pages. The first major contribution of this work is the formulation of a solution to the problem based on a novel probabilistic graphical modeling technique. We use probabilistic inference to make structured predictions in dynamically constructed models so as to identify true examples of faces corresponding to the subject of a biography among all detected faces. Our probabilistic model takes into account information from multiple sources, including: visual comparisons between detected faces, meta-data about facial images and their detections, parent images, image locations, image file names, and caption texts. We believe this research is also unique in that we are the first to present a complete system and an experimental evaluation for the task of mining wild human faces on the scale of over 50,000 identities.
The second major contribution of this work is the development of a new class of discriminative probabilistic models based on a novel generalized Beta-Bernoulli logistic function. Through our generalized Beta-Bernoulli formulation, we provide both a new smooth 0-1 loss approximation method and new class of probabilistic classifiers. We present experiments using this technique for: 1) a new form of Logistic Regression which we call generalized Beta-Bernoulli Logistic Regression, 2) a kernelized version of the aforementioned technique, and 3) our probabilistic face mining model, which can be regarded as a structured prediction technique that combines information from multimedia sources. Through experiments, we show that the different forms of this novel Beta-Bernoulli formulation improve upon the performance of both widely-used Logistic Regression methods and state-of-the-art linear and non-linear Support Vector Machine techniques for binary classification. To evaluate our technique, we have performed tests using a number of widely used benchmarks with different properties ranging from those that are comparatively small to those that are comparatively large in size, as well as problems with both sparse and dense features. Our analysis shows that the generalized Beta-Bernoulli model improves upon the analogous forms of classical Logistic Regression and Support Vector Machine models and that when our evaluations are performed on larger scale datasets, the results are statistically significant. Another finding is that the approach is also robust when dealing with outliers. Furthermore, our face mining model achieves it’s best performance when its sub-component consisting of a discriminative Maximum Entropy Model is replaced with our generalized Beta-Bernoulli Logistic Regression model. This shows the general applicability of our proposed approach for a structured prediction task. To the best of our knowledge, this represents the first time that a smooth approximation to the 0-1 loss has been used for structured predictions.
Finally, we have explored an important problem related to our face extraction task in more depth - the localization of dense keypoints on human faces. Therein, we have developed a complete pipeline that solves the keypoint localization problem using an adaptively estimated, locally linear subspace technique. Our keypoint localization model performs on par with state-of-the-art methods
QUIS-CAMPI: Biometric Recognition in Surveillance Scenarios
The concerns about individuals security have justified the increasing number of surveillance
cameras deployed both in private and public spaces. However, contrary to popular belief,
these devices are in most cases used solely for recording, instead of feeding intelligent analysis
processes capable of extracting information about the observed individuals. Thus, even though
video surveillance has already proved to be essential for solving multiple crimes, obtaining relevant
details about the subjects that took part in a crime depends on the manual inspection
of recordings. As such, the current goal of the research community is the development of
automated surveillance systems capable of monitoring and identifying subjects in surveillance
scenarios. Accordingly, the main goal of this thesis is to improve the performance of biometric
recognition algorithms in data acquired from surveillance scenarios. In particular, we aim at
designing a visual surveillance system capable of acquiring biometric data at a distance (e.g.,
face, iris or gait) without requiring human intervention in the process, as well as devising biometric
recognition methods robust to the degradation factors resulting from the unconstrained
acquisition process.
Regarding the first goal, the analysis of the data acquired by typical surveillance systems
shows that large acquisition distances significantly decrease the resolution of biometric samples,
and thus their discriminability is not sufficient for recognition purposes. In the literature,
diverse works point out Pan Tilt Zoom (PTZ) cameras as the most practical way for acquiring
high-resolution imagery at a distance, particularly when using a master-slave configuration. In
the master-slave configuration, the video acquired by a typical surveillance camera is analyzed
for obtaining regions of interest (e.g., car, person) and these regions are subsequently imaged
at high-resolution by the PTZ camera. Several methods have already shown that this configuration
can be used for acquiring biometric data at a distance. Nevertheless, these methods
failed at providing effective solutions to the typical challenges of this strategy, restraining its
use in surveillance scenarios. Accordingly, this thesis proposes two methods to support the development
of a biometric data acquisition system based on the cooperation of a PTZ camera
with a typical surveillance camera. The first proposal is a camera calibration method capable
of accurately mapping the coordinates of the master camera to the pan/tilt angles of the PTZ
camera. The second proposal is a camera scheduling method for determining - in real-time -
the sequence of acquisitions that maximizes the number of different targets obtained, while
minimizing the cumulative transition time. In order to achieve the first goal of this thesis,
both methods were combined with state-of-the-art approaches of the human monitoring field
to develop a fully automated surveillance capable of acquiring biometric data at a distance and
without human cooperation, designated as QUIS-CAMPI system.
The QUIS-CAMPI system is the basis for pursuing the second goal of this thesis. The analysis
of the performance of the state-of-the-art biometric recognition approaches shows that these
approaches attain almost ideal recognition rates in unconstrained data. However, this performance
is incongruous with the recognition rates observed in surveillance scenarios. Taking into
account the drawbacks of current biometric datasets, this thesis introduces a novel dataset comprising
biometric samples (face images and gait videos) acquired by the QUIS-CAMPI system at a
distance ranging from 5 to 40 meters and without human intervention in the acquisition process.
This set allows to objectively assess the performance of state-of-the-art biometric recognition
methods in data that truly encompass the covariates of surveillance scenarios. As such, this set
was exploited for promoting the first international challenge on biometric recognition in the wild. This thesis describes the evaluation protocols adopted, along with the results obtained
by the nine methods specially designed for this competition. In addition, the data acquired by
the QUIS-CAMPI system were crucial for accomplishing the second goal of this thesis, i.e., the
development of methods robust to the covariates of surveillance scenarios. The first proposal
regards a method for detecting corrupted features in biometric signatures inferred by a redundancy
analysis algorithm. The second proposal is a caricature-based face recognition approach
capable of enhancing the recognition performance by automatically generating a caricature
from a 2D photo. The experimental evaluation of these methods shows that both approaches
contribute to improve the recognition performance in unconstrained data.A crescente preocupação com a segurança dos indivĂduos tem justificado o crescimento
do nĂşmero de câmaras de vĂdeo-vigilância instaladas tanto em espaços privados como pĂşblicos.
Contudo, ao contrário do que normalmente se pensa, estes dispositivos são, na maior parte dos
casos, usados apenas para gravação, não estando ligados a nenhum tipo de software inteligente
capaz de inferir em tempo real informações sobre os indivĂduos observados. Assim, apesar de a
vĂdeo-vigilância ter provado ser essencial na resolução de diversos crimes, o seu uso está ainda
confinado Ă disponibilização de vĂdeos que tĂŞm que ser manualmente inspecionados para extrair
informações relevantes dos sujeitos envolvidos no crime. Como tal, atualmente, o principal
desafio da comunidade cientĂfica Ă© o desenvolvimento de sistemas automatizados capazes de
monitorizar e identificar indivĂduos em ambientes de vĂdeo-vigilância.
Esta tese tem como principal objetivo estender a aplicabilidade dos sistemas de reconhecimento
biomĂ©trico aos ambientes de vĂdeo-vigilância. De forma mais especifica, pretende-se
1) conceber um sistema de vĂdeo-vigilância que consiga adquirir dados biomĂ©tricos a longas distâncias
(e.g., imagens da cara, Ăris, ou vĂdeos do tipo de passo) sem requerer a cooperação dos
indivĂduos no processo; e 2) desenvolver mĂ©todos de reconhecimento biomĂ©trico robustos aos
fatores de degradação inerentes aos dados adquiridos por este tipo de sistemas.
No que diz respeito ao primeiro objetivo, a análise aos dados adquiridos pelos sistemas tĂpicos
de vĂdeo-vigilância mostra que, devido Ă distância de captura, os traços biomĂ©tricos amostrados
não são suficientemente discriminativos para garantir taxas de reconhecimento aceitáveis.
Na literatura, vários trabalhos advogam o uso de câmaras Pan Tilt Zoom (PTZ) para adquirir
imagens de alta resolução à distância, principalmente o uso destes dispositivos no modo masterslave.
Na configuração master-slave um módulo de análise inteligente seleciona zonas de interesse
(e.g. carros, pessoas) a partir do vĂdeo adquirido por uma câmara de vĂdeo-vigilância
e a câmara PTZ é orientada para adquirir em alta resolução as regiões de interesse. Diversos
métodos já mostraram que esta configuração pode ser usada para adquirir dados biométricos
à distância, ainda assim estes não foram capazes de solucionar alguns problemas relacionados
com esta estratĂ©gia, impedindo assim o seu uso em ambientes de vĂdeo-vigilância. Deste modo,
esta tese propõe dois métodos para permitir a aquisição de dados biométricos em ambientes de
vĂdeo-vigilância usando uma câmara PTZ assistida por uma câmara tĂpica de vĂdeo-vigilância. O
primeiro é um método de calibração capaz de mapear de forma exata as coordenadas da câmara
master para o ângulo da câmara PTZ (slave) sem o auxĂlio de outros dispositivos Ăłticos. O
segundo método determina a ordem pela qual um conjunto de sujeitos vai ser observado pela
câmara PTZ. O método proposto consegue determinar em tempo-real a sequência de observações
que maximiza o nĂşmero de diferentes sujeitos observados e simultaneamente minimiza o
tempo total de transição entre sujeitos. De modo a atingir o primeiro objetivo desta tese, os
dois métodos propostos foram combinados com os avanços alcançados na área da monitorização
de humanos para assim desenvolver o primeiro sistema de vĂdeo-vigilância completamente automatizado
e capaz de adquirir dados biométricos a longas distâncias sem requerer a cooperação
dos indivĂduos no processo, designado por sistema QUIS-CAMPI.
O sistema QUIS-CAMPI representa o ponto de partida para iniciar a investigação relacionada
com o segundo objetivo desta tese. A análise do desempenho dos métodos de reconhecimento
biométrico do estado-da-arte mostra que estes conseguem obter taxas de reconhecimento
quase perfeitas em dados adquiridos sem restrições (e.g., taxas de reconhecimento
maiores do que 99% no conjunto de dados LFW). Contudo, este desempenho nĂŁo Ă© corroborado pelos resultados observados em ambientes de vĂdeo-vigilância, o que sugere que os conjuntos
de dados atuais nĂŁo contĂŞm verdadeiramente os fatores de degradação tĂpicos dos ambientes de
vĂdeo-vigilância. Tendo em conta as vulnerabilidades dos conjuntos de dados biomĂ©tricos atuais,
esta tese introduz um novo conjunto de dados biomĂ©tricos (imagens da face e vĂdeos do tipo de
passo) adquiridos pelo sistema QUIS-CAMPI a uma distância máxima de 40m e sem a cooperação
dos sujeitos no processo de aquisição. Este conjunto permite avaliar de forma objetiva o desempenho
dos mĂ©todos do estado-da-arte no reconhecimento de indivĂduos em imagens/vĂdeos
capturados num ambiente real de vĂdeo-vigilância. Como tal, este conjunto foi utilizado para
promover a primeira competição de reconhecimento biométrico em ambientes não controlados.
Esta tese descreve os protocolos de avaliação usados, assim como os resultados obtidos por 9
métodos especialmente desenhados para esta competição. Para além disso, os dados adquiridos
pelo sistema QUIS-CAMPI foram essenciais para o desenvolvimento de dois métodos para
aumentar a robustez aos fatores de degradação observados em ambientes de vĂdeo-vigilância. O
primeiro Ă© um mĂ©todo para detetar caracterĂsticas corruptas em assinaturas biomĂ©tricas atravĂ©s
da análise da redundância entre subconjuntos de caracterĂsticas. O segundo Ă© um mĂ©todo de
reconhecimento facial baseado em caricaturas automaticamente geradas a partir de uma Ăşnica
foto do sujeito. As experiências realizadas mostram que ambos os métodos conseguem reduzir
as taxas de erro em dados adquiridos de forma nĂŁo controlada
Face recognition in uncontrolled environments
This thesis concerns face recognition in uncontrolled environments in which the images used for training and test are collected from the real world instead of laboratories. Compared with controlled environments, images from uncontrolled environments contain more variation in pose, lighting, expression, occlusion, background, image quality, scale, and makeup. Therefore, face recognition in uncontrolled environments is much more challenging than in controlled conditions. Moreover, many real world applications require good recognition performance in uncontrolled environments. Example applications include social networking, human-computer interaction and electronic entertainment. Therefore, researchers and companies have shifted their interest from controlled environments to uncontrolled environments over the past seven years. In this thesis, we divide the history of face recognition into four stages and list the main problems and algorithms at each stage. We find that face recognition in unconstrained environments is still an unsolved problem although many face recognition algorithms have been proposed in the last decade. Existing approaches have two major limitations. First, many methods do not perform well when tested in uncontrolled databases even when all the faces are close to frontal. Second, most current algorithms cannot handle large pose variation, which has become a bottleneck for improving performance. In this thesis, we investigate Bayesian models for face recognition. Our contributions extend Probabilistic Linear Discriminant Analysis (PLDA) [Prince and Elder 2007]. In PLDA, images are described as a sum of signal and noise components. Each component is a weighted combination of basis functions. We firstly investigate the effect of degree of the localization of these basis functions and find better performance is obtained when the signal is treated more locally and the noise more globally. We call this new algorithm multi-scale PLDA and our experiments show it can handle lighting variation better than PLDA but fails for pose variation. We then analyze three existing Bayesian face recognition algorithms and combine the advantages of PLDA and the Joint Bayesian Face algorithm [Chen et al. 2012] to propose Joint PLDA. We find that our new algorithm improves performance compared to existing Bayesian face recognition algorithms. Finally, we propose Tied Joint Bayesian Face algorithm and Tied Joint PLDA to address large pose variations in the data, which drastically decreases performance in most existing face recognition algorithms. To provide sufficient training images with large pose difference, we introduce a new database called the UCL Multi-pose database. We demonstrate that our Bayesian models improve face recognition performance when the pose of the face images varies
Exploring the frontier of smart video surveillance: novel domains and fine-grain event understanding
Intelligent Sensors for Human Motion Analysis
The book, "Intelligent Sensors for Human Motion Analysis," contains 17 articles published in the Special Issue of the Sensors journal. These articles deal with many aspects related to the analysis of human movement. New techniques and methods for pose estimation, gait recognition, and fall detection have been proposed and verified. Some of them will trigger further research, and some may become the backbone of commercial systems
Advances in Monocular Exemplar-based Human Body Pose Analysis: Modeling, Detection and Tracking
Esta tesis contribuye en el análisis de la postura del cuerpo humano a partir de secuencias de imágenes adquiridas con una sola cámara. Esta temática presenta un amplio rango de potenciales aplicaciones en video-vigilancia, video-juegos o aplicaciones biomédicas. Las técnicas basadas en patrones han tenido éxito, sin embargo, su precisión depende de la similitud del punto de vista de la cámara y de las propiedades de la escena entre las imágenes de entrenamiento y las de prueba. Teniendo en cuenta un conjunto de datos de entrenamiento capturado mediante un número reducido de cámaras fijas, paralelas al suelo, se han identificado y analizado tres escenarios posibles con creciente nivel de dificultad: 1) una cámara estática paralela al suelo, 2) una cámara de vigilancia fija con un ángulo de visión considerablemente diferente, y 3) una secuencia de video capturada con una cámara en movimiento o simplemente una sola imagen estática
Skin texture features for face recognition
Face recognition has been deployed in a wide range of important applications including surveillance and forensic identification. However, it still seems to be a challenging problem as its performance severely degrades under illumination, pose and expression variations, as well as with occlusions, and aging. In this thesis, we have investigated the use of local facial skin data as a source of biometric information to improve human recognition. Skin texture features have been exploited in three major tasks, which include (i) improving the performance of conventional face recognition systems, (ii) building an adaptive skin-based face recognition system, and (iii) dealing with circumstances when a full view of the face may not be avai'lable. Additionally, a fully automated scheme is presented for localizing eyes and mouth and segmenting four facial regions: forehead, right cheek, left cheek and chin. These four regions are divided into nonoverlapping patches with equal size. A novel skin/non-skin classifier is proposed for detecting patches containing only skin texture and therefore detecting the pure-skin regions. Experiments using the XM2VTS database indicate that the forehead region has the most significant biometric information. The use of forehead texture features improves the rank-l identification of Eigenfaces system from 77.63% to 84.07%. The rank-l identification is equal 93.56% when this region is fused with Kernel Direct Discriminant Analysis algorithm