115 research outputs found
3D face recognition using photometric stereo
Automatic face recognition has been an active research area for the last four decades. This thesis explores innovative bio-inspired concepts aimed at improved face recognition using surface normals. New directions in salient data representation are explored using data captured via a photometric stereo method from the University of the West of England’s “Photoface” device. Accuracy assessments demonstrate the advantage of the capture format and the synergy offered by near infrared light sources in achieving more accurate results than under conventional visible light. Two 3D face databases have been created as part of the thesis – the publicly available Photoface database which contains 3187 images of 453 subjects and the 3DE-VISIR dataset which contains 363 images of 115 people with different expressions captured simultaneously under near infrared and visible light. The Photoface database is believed to be the ?rst to capture naturalistic 3D face models. Subsets of these databases are then used to show the results of experiments inspired by the human visual system. Experimental results show that optimal recognition rates are achieved using surprisingly low resolution of only 10x10 pixels on surface normal data, which corresponds to the spatial frequency range of optimal human performance. Motivated by the observed increase in recognition speed and accuracy that occurs in humans when faces are caricatured, novel interpretations of caricaturing using outlying data and pixel locations with high variance show that performance remains disproportionately high when up to 90% of the data has been discarded. These direct methods of dimensionality reduction have useful implications for the storage and processing requirements for commercial face recognition systems. The novel variance approach is extended to recognise positive expressions with 90% accuracy which has useful implications for human-computer interaction as well as ensuring that a subject has the correct expression prior to recognition. Furthermore, the subject recognition rate is improved by removing those pixels which encode expression. Finally, preliminary work into feature detection on surface normals by extending Haar-like features is presented which is also shown to be useful for correcting the pose of the head as part of a fully operational device. The system operates with an accuracy of 98.65% at a false acceptance rate of only 0.01 on front facing heads with neutral expressions. The work has shown how new avenues of enquiry inspired by our observation of the human visual system can offer useful advantages towards achieving more robust autonomous computer-based facial recognition
Shape classification: towards a mathematical description of the face
Recent advances in biostereometric techniques have led to the quick and easy
acquisition of 3D data for facial and other biological surfaces. This has led facial
surgeons to express dissatisfaction with landmark-based methods for analysing the
shape of the face which use only a small part of the data available, and to seek a method
for analysing the face which maximizes the use of this extensive data set. Scientists
working in the field of computer vision have developed a variety of methods for the
analysis and description of 2D and 3D shape. These methods are reviewed and an
approach, based on differential geometry, is selected for the description of facial shape.
For each data point, the Gaussian and mean curvatures of the surface are calculated.
The performance of three algorithms for computing these curvatures are evaluated for
mathematically generated standard 3D objects and for 3D data obtained from an optical
surface scanner. Using the signs of these curvatures, the face is classified into eight
'fundamental surface types' - each of which has an intuitive perceptual meaning. The
robustness of the resulting surface type description to errors in the data is determined
together with its repeatability.
Three methods for comparing two surface type descriptions are presented and illustrated
for average male and average female faces. Thus a quantitative description of facial
change, or differences between individual's faces, is achieved. The possible application
of artificial intelligence techniques to automate this comparison is discussed. The
sensitivity of the description to global and local changes to the data, made by
mathematical functions, is investigated.
Examples are given of the application of this method for describing facial changes
made by facial reconstructive surgery and implications for defining a basis for facial
aesthetics using shape are discussed. It is also applied to investigate the role played by
the shape of the surface in facial recognition
The Impact on Emotion Classification Performance and Gaze Behavior of Foveal versus Extrafoveal Processing of Facial Features
At normal interpersonal distances all features of a face cannot fall within one’s fovea simultaneously. Given that certain facial features are differentially informative of different emotions, does the ability to identify facially expressed emotions vary according to the feature fixated and do saccades preferentially seek diagnostic features? Previous findings are equivocal. We presented faces for a brief time, insufficient for a saccade, at a spatial position that guaranteed that a given feature – an eye, cheek, the central brow, or mouth – fell at the fovea. Across two experiments, observers were more accurate and faster at discriminating angry expressions when the high spatial-frequency information of the brow was projected to their fovea than when one or other cheek or eye was. Performance in classifying fear and happiness (Experiment 1) was not influenced by whether the most informative features (eyes and mouth, respectively) were projected foveally or extrafoveally. Observers more accurately distinguished between fearful and surprised expressions (Experiment 2) when the mouth was projected to the fovea. Reflexive first saccades tended towards the left and center of the face rather than preferentially targeting emotion-distinguishing features. These results reflect the integration of task-relevant information across the face constrained by the differences between foveal and extrafoveal processing (Peterson & Eckstein, 2012)
QUIS-CAMPI: Biometric Recognition in Surveillance Scenarios
The concerns about individuals security have justified the increasing number of surveillance
cameras deployed both in private and public spaces. However, contrary to popular belief,
these devices are in most cases used solely for recording, instead of feeding intelligent analysis
processes capable of extracting information about the observed individuals. Thus, even though
video surveillance has already proved to be essential for solving multiple crimes, obtaining relevant
details about the subjects that took part in a crime depends on the manual inspection
of recordings. As such, the current goal of the research community is the development of
automated surveillance systems capable of monitoring and identifying subjects in surveillance
scenarios. Accordingly, the main goal of this thesis is to improve the performance of biometric
recognition algorithms in data acquired from surveillance scenarios. In particular, we aim at
designing a visual surveillance system capable of acquiring biometric data at a distance (e.g.,
face, iris or gait) without requiring human intervention in the process, as well as devising biometric
recognition methods robust to the degradation factors resulting from the unconstrained
acquisition process.
Regarding the first goal, the analysis of the data acquired by typical surveillance systems
shows that large acquisition distances significantly decrease the resolution of biometric samples,
and thus their discriminability is not sufficient for recognition purposes. In the literature,
diverse works point out Pan Tilt Zoom (PTZ) cameras as the most practical way for acquiring
high-resolution imagery at a distance, particularly when using a master-slave configuration. In
the master-slave configuration, the video acquired by a typical surveillance camera is analyzed
for obtaining regions of interest (e.g., car, person) and these regions are subsequently imaged
at high-resolution by the PTZ camera. Several methods have already shown that this configuration
can be used for acquiring biometric data at a distance. Nevertheless, these methods
failed at providing effective solutions to the typical challenges of this strategy, restraining its
use in surveillance scenarios. Accordingly, this thesis proposes two methods to support the development
of a biometric data acquisition system based on the cooperation of a PTZ camera
with a typical surveillance camera. The first proposal is a camera calibration method capable
of accurately mapping the coordinates of the master camera to the pan/tilt angles of the PTZ
camera. The second proposal is a camera scheduling method for determining - in real-time -
the sequence of acquisitions that maximizes the number of different targets obtained, while
minimizing the cumulative transition time. In order to achieve the first goal of this thesis,
both methods were combined with state-of-the-art approaches of the human monitoring field
to develop a fully automated surveillance capable of acquiring biometric data at a distance and
without human cooperation, designated as QUIS-CAMPI system.
The QUIS-CAMPI system is the basis for pursuing the second goal of this thesis. The analysis
of the performance of the state-of-the-art biometric recognition approaches shows that these
approaches attain almost ideal recognition rates in unconstrained data. However, this performance
is incongruous with the recognition rates observed in surveillance scenarios. Taking into
account the drawbacks of current biometric datasets, this thesis introduces a novel dataset comprising
biometric samples (face images and gait videos) acquired by the QUIS-CAMPI system at a
distance ranging from 5 to 40 meters and without human intervention in the acquisition process.
This set allows to objectively assess the performance of state-of-the-art biometric recognition
methods in data that truly encompass the covariates of surveillance scenarios. As such, this set
was exploited for promoting the first international challenge on biometric recognition in the wild. This thesis describes the evaluation protocols adopted, along with the results obtained
by the nine methods specially designed for this competition. In addition, the data acquired by
the QUIS-CAMPI system were crucial for accomplishing the second goal of this thesis, i.e., the
development of methods robust to the covariates of surveillance scenarios. The first proposal
regards a method for detecting corrupted features in biometric signatures inferred by a redundancy
analysis algorithm. The second proposal is a caricature-based face recognition approach
capable of enhancing the recognition performance by automatically generating a caricature
from a 2D photo. The experimental evaluation of these methods shows that both approaches
contribute to improve the recognition performance in unconstrained data.A crescente preocupação com a segurança dos indivĂduos tem justificado o crescimento
do nĂşmero de câmaras de vĂdeo-vigilância instaladas tanto em espaços privados como pĂşblicos.
Contudo, ao contrário do que normalmente se pensa, estes dispositivos são, na maior parte dos
casos, usados apenas para gravação, não estando ligados a nenhum tipo de software inteligente
capaz de inferir em tempo real informações sobre os indivĂduos observados. Assim, apesar de a
vĂdeo-vigilância ter provado ser essencial na resolução de diversos crimes, o seu uso está ainda
confinado Ă disponibilização de vĂdeos que tĂŞm que ser manualmente inspecionados para extrair
informações relevantes dos sujeitos envolvidos no crime. Como tal, atualmente, o principal
desafio da comunidade cientĂfica Ă© o desenvolvimento de sistemas automatizados capazes de
monitorizar e identificar indivĂduos em ambientes de vĂdeo-vigilância.
Esta tese tem como principal objetivo estender a aplicabilidade dos sistemas de reconhecimento
biomĂ©trico aos ambientes de vĂdeo-vigilância. De forma mais especifica, pretende-se
1) conceber um sistema de vĂdeo-vigilância que consiga adquirir dados biomĂ©tricos a longas distâncias
(e.g., imagens da cara, Ăris, ou vĂdeos do tipo de passo) sem requerer a cooperação dos
indivĂduos no processo; e 2) desenvolver mĂ©todos de reconhecimento biomĂ©trico robustos aos
fatores de degradação inerentes aos dados adquiridos por este tipo de sistemas.
No que diz respeito ao primeiro objetivo, a análise aos dados adquiridos pelos sistemas tĂpicos
de vĂdeo-vigilância mostra que, devido Ă distância de captura, os traços biomĂ©tricos amostrados
não são suficientemente discriminativos para garantir taxas de reconhecimento aceitáveis.
Na literatura, vários trabalhos advogam o uso de câmaras Pan Tilt Zoom (PTZ) para adquirir
imagens de alta resolução à distância, principalmente o uso destes dispositivos no modo masterslave.
Na configuração master-slave um módulo de análise inteligente seleciona zonas de interesse
(e.g. carros, pessoas) a partir do vĂdeo adquirido por uma câmara de vĂdeo-vigilância
e a câmara PTZ é orientada para adquirir em alta resolução as regiões de interesse. Diversos
métodos já mostraram que esta configuração pode ser usada para adquirir dados biométricos
à distância, ainda assim estes não foram capazes de solucionar alguns problemas relacionados
com esta estratĂ©gia, impedindo assim o seu uso em ambientes de vĂdeo-vigilância. Deste modo,
esta tese propõe dois métodos para permitir a aquisição de dados biométricos em ambientes de
vĂdeo-vigilância usando uma câmara PTZ assistida por uma câmara tĂpica de vĂdeo-vigilância. O
primeiro é um método de calibração capaz de mapear de forma exata as coordenadas da câmara
master para o ângulo da câmara PTZ (slave) sem o auxĂlio de outros dispositivos Ăłticos. O
segundo método determina a ordem pela qual um conjunto de sujeitos vai ser observado pela
câmara PTZ. O método proposto consegue determinar em tempo-real a sequência de observações
que maximiza o nĂşmero de diferentes sujeitos observados e simultaneamente minimiza o
tempo total de transição entre sujeitos. De modo a atingir o primeiro objetivo desta tese, os
dois métodos propostos foram combinados com os avanços alcançados na área da monitorização
de humanos para assim desenvolver o primeiro sistema de vĂdeo-vigilância completamente automatizado
e capaz de adquirir dados biométricos a longas distâncias sem requerer a cooperação
dos indivĂduos no processo, designado por sistema QUIS-CAMPI.
O sistema QUIS-CAMPI representa o ponto de partida para iniciar a investigação relacionada
com o segundo objetivo desta tese. A análise do desempenho dos métodos de reconhecimento
biométrico do estado-da-arte mostra que estes conseguem obter taxas de reconhecimento
quase perfeitas em dados adquiridos sem restrições (e.g., taxas de reconhecimento
maiores do que 99% no conjunto de dados LFW). Contudo, este desempenho nĂŁo Ă© corroborado pelos resultados observados em ambientes de vĂdeo-vigilância, o que sugere que os conjuntos
de dados atuais nĂŁo contĂŞm verdadeiramente os fatores de degradação tĂpicos dos ambientes de
vĂdeo-vigilância. Tendo em conta as vulnerabilidades dos conjuntos de dados biomĂ©tricos atuais,
esta tese introduz um novo conjunto de dados biomĂ©tricos (imagens da face e vĂdeos do tipo de
passo) adquiridos pelo sistema QUIS-CAMPI a uma distância máxima de 40m e sem a cooperação
dos sujeitos no processo de aquisição. Este conjunto permite avaliar de forma objetiva o desempenho
dos mĂ©todos do estado-da-arte no reconhecimento de indivĂduos em imagens/vĂdeos
capturados num ambiente real de vĂdeo-vigilância. Como tal, este conjunto foi utilizado para
promover a primeira competição de reconhecimento biométrico em ambientes não controlados.
Esta tese descreve os protocolos de avaliação usados, assim como os resultados obtidos por 9
métodos especialmente desenhados para esta competição. Para além disso, os dados adquiridos
pelo sistema QUIS-CAMPI foram essenciais para o desenvolvimento de dois métodos para
aumentar a robustez aos fatores de degradação observados em ambientes de vĂdeo-vigilância. O
primeiro Ă© um mĂ©todo para detetar caracterĂsticas corruptas em assinaturas biomĂ©tricas atravĂ©s
da análise da redundância entre subconjuntos de caracterĂsticas. O segundo Ă© um mĂ©todo de
reconhecimento facial baseado em caricaturas automaticamente geradas a partir de uma Ăşnica
foto do sujeito. As experiências realizadas mostram que ambos os métodos conseguem reduzir
as taxas de erro em dados adquiridos de forma nĂŁo controlada
Change blindness: eradication of gestalt strategies
Arrays of eight, texture-defined rectangles were used as stimuli in a one-shot change blindness (CB) task where there was a 50% chance that one rectangle would change orientation between two successive presentations separated by an interval. CB was eliminated by cueing the target rectangle in the first stimulus, reduced by cueing in the interval and unaffected by cueing in the second presentation. This supports the idea that a representation was formed that persisted through the interval before being 'overwritten' by the second presentation (Landman et al, 2003 Vision Research 43149–164]. Another possibility is that participants used some kind of grouping or Gestalt strategy. To test this we changed the spatial position of the rectangles in the second presentation by shifting them along imaginary spokes (by ±1 degree) emanating from the central fixation point. There was no significant difference seen in performance between this and the standard task [F(1,4)=2.565, p=0.185]. This may suggest two things: (i) Gestalt grouping is not used as a strategy in these tasks, and (ii) it gives further weight to the argument that objects may be stored and retrieved from a pre-attentional store during this task
Face age estimation using wrinkle patterns
Face age estimation is a challenging problem due to the variation of craniofacial growth,
skin texture, gender and race. With recent growth in face age estimation research, wrinkles
received attention from a number of research, as it is generally perceived as aging
feature and soft biometric for person identification. In a face image, wrinkle is a discontinuous
and arbitrary line pattern that varies in different face regions and subjects.
Existing wrinkle detection algorithms and wrinkle-based features are not robust for face
age estimation. They are either weakly represented or not validated against the ground
truth. The primary aim of this thesis is to develop a robust wrinkle detection method
and construct novel wrinkle-based methods for face age estimation. First, Hybrid Hessian
Filter (HHF) is proposed to segment the wrinkles using the directional gradient
and a ridge-valley Gaussian kernel. Second, Hessian Line Tracking (HLT) is proposed
for wrinkle detection by exploring the wrinkle connectivity of surrounding pixels using a
cross-sectional profile. Experimental results showed that HLT outperforms other wrinkle
detection algorithms with an accuracy of 84% and 79% on the datasets of FORERUS
and FORERET while HHF achieves 77% and 49%, respectively. Third, Multi-scale
Wrinkle Patterns (MWP) is proposed as a novel feature representation for face age
estimation using the wrinkle location, intensity and density. Fourth, Hybrid Aging Patterns
(HAP) is proposed as a hybrid pattern for face age estimation using Facial Appearance
Model (FAM) and MWP. Fifth, Multi-layer Age Regression (MAR) is proposed as
a hierarchical model in complementary of FAM and MWP for face age estimation. For
performance assessment of age estimation, four datasets namely FGNET, MORPH,
FERET and PAL with different age ranges and sample sizes are used as benchmarks.
Results showed that MAR achieves the lowest Mean Absolute Error (MAE) of 3.00
( 4.14) on FERET and HAP scores a comparable MAE of 3.02 ( 2.92) as state of the
art. In conclusion, wrinkles are important features and the uniqueness of this pattern
should be considered in developing a robust model for face age estimation
Das Fourierspektrum von Gesichtsbildern in Photographie und Kunst und dessen Einfluss auf die Gesichtswahrnehmung
Ästhetische gemalte Bilder haben einen Anstieg von -2 im radiär gemittelten Fourierspektrum (1/f2-Eigenschaften), ähnlich wie natürliche Szenen. Wir untersuchten, wie Künstler Gesichter, die einen anderen Anstieg besitzen, abbilden. Dafür wurden 300 gemalte ästhetische Porträts von namhaften Künstlern digitalisiert. Der Anstieg von Porträts und Gesichtsfotografien wurde ermittelt und verglichen. Unsere erste Studie zeigte, dass ästhetische gemalte Porträts 1/f2-Eigenschaften haben, die denen natürlicher Szenen ähnlich sind und sich in dieser Hinsicht deutlich von Gesichtsfotografien unterscheiden. Wir fanden Hinweise, dass Künstler ihre Abbildungen an Kodierungsmechanismen des visuellen Systems anpassen und nicht die Eigenschaften der Objekte abbilden, welche diese natürlicherweise besitzen.
Ich konnte durch Manipulation des Anstiegs von Gesichtsfotos den relativen Anteil von groben und feinen Strukturen im Bild verändern. Wir untersuchten, wie das Erlernen und Erkennen unbekannter Gesichter durch Manipulation von 1/fp-Eigenschaften des Fourierspektrums beeinflusst wurde. Wir erstellten zwei Gruppen von Gesichtsfotografien mit veränderten 1/fp-Eigenschaften: Zum einen Gesichter mit steilerem Anstieg, zum anderen Gesichter mit flacherem Anstieg und 1/f2-Eigenschaften. In einem Gesichter-Lernexperiment wurden Verhaltensdaten und EEG-Korrelate der Gesichterwahrnehmung untersucht. Fotos mit steilem Anstieg konnten schlechter gelernt werden. Es zeigten sich langsamere Reaktionszeiten und verminderte neuro-physiologische Korrelate der Gesichterwahrnehmung. Im Gegensatz dazu konnten Gesichtsfotos mit flacherem Anstieg, der gemalten Porträts und natürlichen Szenen ähnlich ist, leichter gelernt werden und es fanden sich größere neurophysiologische Korrelate der Gesichterwahrnehmung
Less than meets the eye: the diagnostic information for visual categorization
Current theories of visual categorization are cast in terms of information processing mechanisms that use mental representations. However, the actual information contents of these representations are rarely characterized, which in turn hinders knowledge of mechanisms that use them. In this thesis, I identified these contents by extracting the information that supports behavior under given tasks - i.e., the task-specific diagnostic information.
In the first study (Chapter 2), I modelled the diagnostic face information for familiar face identification, using a unique generative model of face identity information combined with perceptual judgments and reverse correlation. I then demonstrated the validity of this information using everyday perceptual tasks that generalize face identity and resemblance judgments to new viewpoints, age, and sex with a new group of participants. My results showed that human participants represent only a proportion of the objective identity information available, but what they do represent is both sufficiently detailed and versatile to generalize face identification across diverse tasks successfully.
In the second study (Chapter 3), I modelled the diagnostic facial movement for facial expressions of emotion recognition. I used the models that characterize the mental representations of six facial expressions of emotion (Happy, Surprise, Fear, Anger, Disgust, and Sad) in individual observers. I validated them on a new group of participants. With the validated models, I derived main signal variants for each emotion and their probabilities of occurrence within each emotion. Using these variants and their probability, I trained a Bayesian classifier and showed that the Bayesian classifier mimics human observers’ categorization performance closely. My results demonstrated that such emotion variants and their probabilities of occurrence comprise observers’ mental representations of facial expressions of emotion.
In the third study (Chapter 4), I investigated how the brain reduces high dimensional visual input into low dimensional diagnostic representations to support a scene categorization. To do so, I used an information theoretic framework called Contentful Brain and Behavior Imaging (CBBI) to tease apart stimulus information that supports behavior (i.e., diagnostic) from that which does not (i.e., nondiagnostic). I then tracked the dynamic representations of both in magneto-encephalographic (MEG) activity. Using CBBI, I demonstrated a rapid (~170 ms) reduction of nondiagnostic information occurs in the occipital cortex and the progression of diagnostic information into right fusiform gyrus where they are constructed to support distinct behaviors. My results highlight how CBBI can be used to investigate the information processing from brain activity by considering interactions between three variables (stimulus information, brain activity, behavior), rather than just two, as is the current norm in neuroimaging studies.
I discussed the task-specific diagnostic information as individuals’ dynamic and experienced-based representation about the physical world, which provides us the much-needed information to search and understand the black box of high-dimensional, deep and biological brain networks. I also discussed the practical concerns about using the data-driven approach to uncover diagnostic information
The Role of Physical Image Properties in Facial Expression and Identity Perception
A number of attempts have been made to understand which physical image properties are important for the perception of different facial characteristics. These physical image properties have been broadly split in to two categories; namely facial shape and facial surface. Current accounts of face processing suggest that whilst judgements of facial identity rely approximately equally on facial shape and surface properties, judgements of facial expression are heavily shape dependent. This thesis presents behavioural experiments and fMRI experiments employing multi voxel pattern analysis (MVPA) to investigate the extent to which facial shape and surface properties underpin identity and expression perception and how these image properties are represented neurally. The first empirical chapter presents experiments showing that facial expressions are categorised approximately equally well when either facial shape or surface is the varying image cue. The second empirical chapter shows that neural patterns of response to facial expressions in the Occipital Face Area (OFA) and Superior Temporal Sulcus (STS) are reflected by patterns of perceptual similarity of the different expressions, in turn these patterns of perceptual similarity can be predicted by both facial shape and surface properties. The third empirical chapter demonstrates that distinct patterns of neural response can be found to shape based but not surface based cues to facial identity in the OFA and Fusiform Face Area (FFA). The final experimental chapter in this thesis demonstrates that the newly discovered contrast chimera effect is heavily dependent on the eye region and holistic face representations conveying facial identity. Taken together, these findings show the importance of facial surface as well as facial shape in expression perception. For facial identity both facial shape and surface cues are important for the contrast chimera effect although there are more consistent identity based neural response patterns to facial shape in face responsive brain regions
- …