78 research outputs found
A Reminiscence of ”Mastermind”: Iris/Periocular Biometrics by ”In-Set” CNN Iterative Analysis
Convolutional neural networks (CNNs) have
emerged as the most popular classification models in biometrics
research. Under the discriminative paradigm of pattern
recognition, CNNs are used typically in one of two ways: 1)
verification mode (”are samples from the same person?”), where
pairs of images are provided to the network to distinguish
between genuine and impostor instances; and 2) identification
mode (”whom is this sample from?”), where appropriate feature
representations that map images to identities are found. This
paper postulates a novel mode for using CNNs in biometric
identification, by learning models that answer to the question ”is
the query’s identity among this set?”. The insight is a reminiscence
of the classical Mastermind game: by iteratively analysing the
network responses when multiple random samples of k gallery
elements are compared to the query, we obtain weakly correlated
matching scores that - altogether - provide solid cues to infer
the most likely identity. In this setting, identification is regarded
as a variable selection and regularization problem, with sparse
linear regression techniques being used to infer the matching
probability with respect to each gallery identity. As main strength,
this strategy is highly robust to outlier matching scores, which
are known to be a primary error source in biometric recognition.
Our experiments were carried out in full versions of two
well known irises near-infrared (CASIA-IrisV4-Thousand) and
periocular visible wavelength (UBIRIS.v2) datasets, and confirm
that recognition performance can be solidly boosted-up by the
proposed algorithm, when compared to the traditional working
modes of CNNs in biometrics.info:eu-repo/semantics/publishedVersio
QUIS-CAMPI: Biometric Recognition in Surveillance Scenarios
The concerns about individuals security have justified the increasing number of surveillance
cameras deployed both in private and public spaces. However, contrary to popular belief,
these devices are in most cases used solely for recording, instead of feeding intelligent analysis
processes capable of extracting information about the observed individuals. Thus, even though
video surveillance has already proved to be essential for solving multiple crimes, obtaining relevant
details about the subjects that took part in a crime depends on the manual inspection
of recordings. As such, the current goal of the research community is the development of
automated surveillance systems capable of monitoring and identifying subjects in surveillance
scenarios. Accordingly, the main goal of this thesis is to improve the performance of biometric
recognition algorithms in data acquired from surveillance scenarios. In particular, we aim at
designing a visual surveillance system capable of acquiring biometric data at a distance (e.g.,
face, iris or gait) without requiring human intervention in the process, as well as devising biometric
recognition methods robust to the degradation factors resulting from the unconstrained
acquisition process.
Regarding the first goal, the analysis of the data acquired by typical surveillance systems
shows that large acquisition distances significantly decrease the resolution of biometric samples,
and thus their discriminability is not sufficient for recognition purposes. In the literature,
diverse works point out Pan Tilt Zoom (PTZ) cameras as the most practical way for acquiring
high-resolution imagery at a distance, particularly when using a master-slave configuration. In
the master-slave configuration, the video acquired by a typical surveillance camera is analyzed
for obtaining regions of interest (e.g., car, person) and these regions are subsequently imaged
at high-resolution by the PTZ camera. Several methods have already shown that this configuration
can be used for acquiring biometric data at a distance. Nevertheless, these methods
failed at providing effective solutions to the typical challenges of this strategy, restraining its
use in surveillance scenarios. Accordingly, this thesis proposes two methods to support the development
of a biometric data acquisition system based on the cooperation of a PTZ camera
with a typical surveillance camera. The first proposal is a camera calibration method capable
of accurately mapping the coordinates of the master camera to the pan/tilt angles of the PTZ
camera. The second proposal is a camera scheduling method for determining - in real-time -
the sequence of acquisitions that maximizes the number of different targets obtained, while
minimizing the cumulative transition time. In order to achieve the first goal of this thesis,
both methods were combined with state-of-the-art approaches of the human monitoring field
to develop a fully automated surveillance capable of acquiring biometric data at a distance and
without human cooperation, designated as QUIS-CAMPI system.
The QUIS-CAMPI system is the basis for pursuing the second goal of this thesis. The analysis
of the performance of the state-of-the-art biometric recognition approaches shows that these
approaches attain almost ideal recognition rates in unconstrained data. However, this performance
is incongruous with the recognition rates observed in surveillance scenarios. Taking into
account the drawbacks of current biometric datasets, this thesis introduces a novel dataset comprising
biometric samples (face images and gait videos) acquired by the QUIS-CAMPI system at a
distance ranging from 5 to 40 meters and without human intervention in the acquisition process.
This set allows to objectively assess the performance of state-of-the-art biometric recognition
methods in data that truly encompass the covariates of surveillance scenarios. As such, this set
was exploited for promoting the first international challenge on biometric recognition in the wild. This thesis describes the evaluation protocols adopted, along with the results obtained
by the nine methods specially designed for this competition. In addition, the data acquired by
the QUIS-CAMPI system were crucial for accomplishing the second goal of this thesis, i.e., the
development of methods robust to the covariates of surveillance scenarios. The first proposal
regards a method for detecting corrupted features in biometric signatures inferred by a redundancy
analysis algorithm. The second proposal is a caricature-based face recognition approach
capable of enhancing the recognition performance by automatically generating a caricature
from a 2D photo. The experimental evaluation of these methods shows that both approaches
contribute to improve the recognition performance in unconstrained data.A crescente preocupação com a segurança dos indivĂduos tem justificado o crescimento
do nĂşmero de câmaras de vĂdeo-vigilância instaladas tanto em espaços privados como pĂşblicos.
Contudo, ao contrário do que normalmente se pensa, estes dispositivos são, na maior parte dos
casos, usados apenas para gravação, não estando ligados a nenhum tipo de software inteligente
capaz de inferir em tempo real informações sobre os indivĂduos observados. Assim, apesar de a
vĂdeo-vigilância ter provado ser essencial na resolução de diversos crimes, o seu uso está ainda
confinado Ă disponibilização de vĂdeos que tĂŞm que ser manualmente inspecionados para extrair
informações relevantes dos sujeitos envolvidos no crime. Como tal, atualmente, o principal
desafio da comunidade cientĂfica Ă© o desenvolvimento de sistemas automatizados capazes de
monitorizar e identificar indivĂduos em ambientes de vĂdeo-vigilância.
Esta tese tem como principal objetivo estender a aplicabilidade dos sistemas de reconhecimento
biomĂ©trico aos ambientes de vĂdeo-vigilância. De forma mais especifica, pretende-se
1) conceber um sistema de vĂdeo-vigilância que consiga adquirir dados biomĂ©tricos a longas distâncias
(e.g., imagens da cara, Ăris, ou vĂdeos do tipo de passo) sem requerer a cooperação dos
indivĂduos no processo; e 2) desenvolver mĂ©todos de reconhecimento biomĂ©trico robustos aos
fatores de degradação inerentes aos dados adquiridos por este tipo de sistemas.
No que diz respeito ao primeiro objetivo, a análise aos dados adquiridos pelos sistemas tĂpicos
de vĂdeo-vigilância mostra que, devido Ă distância de captura, os traços biomĂ©tricos amostrados
não são suficientemente discriminativos para garantir taxas de reconhecimento aceitáveis.
Na literatura, vários trabalhos advogam o uso de câmaras Pan Tilt Zoom (PTZ) para adquirir
imagens de alta resolução à distância, principalmente o uso destes dispositivos no modo masterslave.
Na configuração master-slave um módulo de análise inteligente seleciona zonas de interesse
(e.g. carros, pessoas) a partir do vĂdeo adquirido por uma câmara de vĂdeo-vigilância
e a câmara PTZ é orientada para adquirir em alta resolução as regiões de interesse. Diversos
métodos já mostraram que esta configuração pode ser usada para adquirir dados biométricos
à distância, ainda assim estes não foram capazes de solucionar alguns problemas relacionados
com esta estratĂ©gia, impedindo assim o seu uso em ambientes de vĂdeo-vigilância. Deste modo,
esta tese propõe dois métodos para permitir a aquisição de dados biométricos em ambientes de
vĂdeo-vigilância usando uma câmara PTZ assistida por uma câmara tĂpica de vĂdeo-vigilância. O
primeiro é um método de calibração capaz de mapear de forma exata as coordenadas da câmara
master para o ângulo da câmara PTZ (slave) sem o auxĂlio de outros dispositivos Ăłticos. O
segundo método determina a ordem pela qual um conjunto de sujeitos vai ser observado pela
câmara PTZ. O método proposto consegue determinar em tempo-real a sequência de observações
que maximiza o nĂşmero de diferentes sujeitos observados e simultaneamente minimiza o
tempo total de transição entre sujeitos. De modo a atingir o primeiro objetivo desta tese, os
dois métodos propostos foram combinados com os avanços alcançados na área da monitorização
de humanos para assim desenvolver o primeiro sistema de vĂdeo-vigilância completamente automatizado
e capaz de adquirir dados biométricos a longas distâncias sem requerer a cooperação
dos indivĂduos no processo, designado por sistema QUIS-CAMPI.
O sistema QUIS-CAMPI representa o ponto de partida para iniciar a investigação relacionada
com o segundo objetivo desta tese. A análise do desempenho dos métodos de reconhecimento
biométrico do estado-da-arte mostra que estes conseguem obter taxas de reconhecimento
quase perfeitas em dados adquiridos sem restrições (e.g., taxas de reconhecimento
maiores do que 99% no conjunto de dados LFW). Contudo, este desempenho nĂŁo Ă© corroborado pelos resultados observados em ambientes de vĂdeo-vigilância, o que sugere que os conjuntos
de dados atuais nĂŁo contĂŞm verdadeiramente os fatores de degradação tĂpicos dos ambientes de
vĂdeo-vigilância. Tendo em conta as vulnerabilidades dos conjuntos de dados biomĂ©tricos atuais,
esta tese introduz um novo conjunto de dados biomĂ©tricos (imagens da face e vĂdeos do tipo de
passo) adquiridos pelo sistema QUIS-CAMPI a uma distância máxima de 40m e sem a cooperação
dos sujeitos no processo de aquisição. Este conjunto permite avaliar de forma objetiva o desempenho
dos mĂ©todos do estado-da-arte no reconhecimento de indivĂduos em imagens/vĂdeos
capturados num ambiente real de vĂdeo-vigilância. Como tal, este conjunto foi utilizado para
promover a primeira competição de reconhecimento biométrico em ambientes não controlados.
Esta tese descreve os protocolos de avaliação usados, assim como os resultados obtidos por 9
métodos especialmente desenhados para esta competição. Para além disso, os dados adquiridos
pelo sistema QUIS-CAMPI foram essenciais para o desenvolvimento de dois métodos para
aumentar a robustez aos fatores de degradação observados em ambientes de vĂdeo-vigilância. O
primeiro Ă© um mĂ©todo para detetar caracterĂsticas corruptas em assinaturas biomĂ©tricas atravĂ©s
da análise da redundância entre subconjuntos de caracterĂsticas. O segundo Ă© um mĂ©todo de
reconhecimento facial baseado em caricaturas automaticamente geradas a partir de uma Ăşnica
foto do sujeito. As experiências realizadas mostram que ambos os métodos conseguem reduzir
as taxas de erro em dados adquiridos de forma nĂŁo controlada
A Survey on Computer Vision based Human Analysis in the COVID-19 Era
The emergence of COVID-19 has had a global and profound impact, not only on
society as a whole, but also on the lives of individuals. Various prevention
measures were introduced around the world to limit the transmission of the
disease, including face masks, mandates for social distancing and regular
disinfection in public spaces, and the use of screening applications. These
developments also triggered the need for novel and improved computer vision
techniques capable of (i) providing support to the prevention measures through
an automated analysis of visual data, on the one hand, and (ii) facilitating
normal operation of existing vision-based services, such as biometric
authentication schemes, on the other. Especially important here, are computer
vision techniques that focus on the analysis of people and faces in visual data
and have been affected the most by the partial occlusions introduced by the
mandates for facial masks. Such computer vision based human analysis techniques
include face and face-mask detection approaches, face recognition techniques,
crowd counting solutions, age and expression estimation procedures, models for
detecting face-hand interactions and many others, and have seen considerable
attention over recent years. The goal of this survey is to provide an
introduction to the problems induced by COVID-19 into such research and to
present a comprehensive review of the work done in the computer vision based
human analysis field. Particular attention is paid to the impact of facial
masks on the performance of various methods and recent solutions to mitigate
this problem. Additionally, a detailed review of existing datasets useful for
the development and evaluation of methods for COVID-19 related applications is
also provided. Finally, to help advance the field further, a discussion on the
main open challenges and future research direction is given.Comment: Submitted to Image and Vision Computing, 44 pages, 7 figure
Chimerical dataset creation protocol based on Doddington Zoo : a biometric application with face, eye, and ECG.
Multimodal systems are a workaround to enhance the robustness and effectiveness of biometric systems. A proper multimodal dataset is of the utmost importance to build such systems. The literature presents some multimodal datasets, although, to the best of our knowledge, there are no previous studies combining face, iris/eye, and vital signals such as the Electrocardiogram (ECG). Moreover, there is no methodology to guide the construction and evaluation of a chimeric dataset. Taking that fact into account, we propose to create a chimeric dataset from three modalities in this work: ECG, eye, and face. Based on the Doddington Zoo criteria, we also propose a generic and systematic protocol imposing constraints for the creation of homogeneous chimeric individuals, which allow us to perform a fair and reproducible benchmark. Moreover, we have proposed a multimodal approach for these modalities based on state-of-the-art deep representations built by convolutional neural networks. We conduct the experiments in the open-world verification mode and on two different scenarios (intra-session and inter-session), using three modalities from two datasets: CYBHi (ECG) and FRGC (eye and face). Our multimodal approach achieves impressive decidability of 7.20 ? 0.18, yielding an almost perfect verification system (i.e., Equal Error Rate (EER) of 0.20% ? 0.06) on the intra-session scenario with unknown data. On the inter-session scenario, we achieve a decidability of 7.78 ? 0.78 and an EER of 0.06% ? 0.06. In summary, these figures represent a gain of over 28% in decidability and a reduction over 11% of the EER on the intra-session scenario for unknown data compared to the best-known unimodal approach. Besides, we achieve an improvement greater than 22% in decidability and an EER reduction over 6% in the inter-session scenario
One-Shot Learning for Periocular Recognition: Exploring the Effect of Domain Adaptation and Data Bias on Deep Representations
One weakness of machine-learning algorithms is the need to train the models
for a new task. This presents a specific challenge for biometric recognition
due to the dynamic nature of databases and, in some instances, the reliance on
subject collaboration for data collection. In this paper, we investigate the
behavior of deep representations in widely used CNN models under extreme data
scarcity for One-Shot periocular recognition, a biometric recognition task. We
analyze the outputs of CNN layers as identity-representing feature vectors. We
examine the impact of Domain Adaptation on the network layers' output for
unseen data and evaluate the method's robustness concerning data normalization
and generalization of the best-performing layer. We improved state-of-the-art
results that made use of networks trained with biometric datasets with millions
of images and fine-tuned for the target periocular dataset by utilizing
out-of-the-box CNNs trained for the ImageNet Recognition Challenge and standard
computer vision algorithms. For example, for the Cross-Eyed dataset, we could
reduce the EER by 67% and 79% (from 1.70% and 3.41% to 0.56% and 0.71%) in the
Close-World and Open-World protocols, respectively, for the periocular case. We
also demonstrate that traditional algorithms like SIFT can outperform CNNs in
situations with limited data or scenarios where the network has not been
trained with the test classes like the Open-World mode. SIFT alone was able to
reduce the EER by 64% and 71.6% (from 1.7% and 3.41% to 0.6% and 0.97%) for
Cross-Eyed in the Close-World and Open-World protocols, respectively, and a
reduction of 4.6% (from 3.94% to 3.76%) in the PolyU database for the
Open-World and single biometric case.Comment: Submitted preprint to IEE Acces
- …