123 research outputs found
Extrinsic Methods for Coding and Dictionary Learning on Grassmann Manifolds
Sparsity-based representations have recently led to notable results in
various visual recognition tasks. In a separate line of research, Riemannian
manifolds have been shown useful for dealing with features and models that do
not lie in Euclidean spaces. With the aim of building a bridge between the two
realms, we address the problem of sparse coding and dictionary learning over
the space of linear subspaces, which form Riemannian structures known as
Grassmann manifolds. To this end, we propose to embed Grassmann manifolds into
the space of symmetric matrices by an isometric mapping. This in turn enables
us to extend two sparse coding schemes to Grassmann manifolds. Furthermore, we
propose closed-form solutions for learning a Grassmann dictionary, atom by
atom. Lastly, to handle non-linearity in data, we extend the proposed Grassmann
sparse coding and dictionary learning algorithms through embedding into Hilbert
spaces.
Experiments on several classification tasks (gender recognition, gesture
classification, scene analysis, face recognition, action recognition and
dynamic texture classification) show that the proposed approaches achieve
considerable improvements in discrimination accuracy, in comparison to
state-of-the-art methods such as kernelized Affine Hull Method and
graph-embedding Grassmann discriminant analysis.Comment: Appearing in International Journal of Computer Visio
The fundamentals of unimodal palmprint authentication based on a biometric system: A review
Biometric system can be defined as the automated method of identifying or authenticating the identity of a living person based on physiological or behavioral traits. Palmprint biometric-based authentication has gained considerable attention in recent years. Globally, enterprises have been exploring biometric authorization for some time, for the purpose of security, payment processing, law enforcement CCTV systems, and even access to offices, buildings, and gyms via the entry doors. Palmprint biometric system can be divided into unimodal and multimodal. This paper will investigate the biometric system and provide a detailed overview of the palmprint technology with existing recognition approaches. Finally, we introduce a review of previous works based on a unimodal palmprint system using different databases
Extrinsic methods for coding and dictionary learning on grassmann manifolds
Sparsity-based representations have recently led to notable results in various visual recognition tasks. In a separate line of research, Riemannian manifolds have been shown useful for dealing with features and models that do not lie in Euclidean spaces. With the aim of building a bridge between the two realms, we address the problem of sparse coding and dictionary learning in Grassmann manifolds, i.e., the space of linear subspaces. To this end, we propose to embed Grassmann manifolds into the space of symmetric matrices by an isometric mapping. This in turn enables us to extend two sparse coding schemes to Grassmann manifolds. Furthermore, we propose an algorithm for learning a Grassmann dictionary, atom by atom. Lastly, to handle non-linearity in data, we extend the proposed Grassmann sparse coding and dictionary learning algorithms through embedding into higher dimensional Hilbert spaces. Experiments on several classification tasks (gender recognition, gesture classification, scene analysis, face recognition, action recognition and dynamic texture classification) show that the proposed approaches achieve considerable improvements in discrimination accuracy, in comparison to state-of-the-art methods such as kernelized Affine Hull Method and graph-embedding Grassmann discriminant analysis
Homogeneous and Heterogeneous Face Recognition: Enhancing, Encoding and Matching for Practical Applications
Face Recognition is the automatic processing of face images with the purpose to recognize individuals. Recognition task becomes especially challenging in surveillance applications, where images are acquired from a long range in the presence of difficult environments. Short Wave Infrared (SWIR) is an emerging imaging modality that is able to produce clear long range images in difficult environments or during night time. Despite the benefits of the SWIR technology, matching SWIR images against a gallery of visible images presents a challenge, since the photometric properties of the images in the two spectral bands are highly distinct.;In this dissertation, we describe a cross spectral matching method that encodes magnitude and phase of multi-spectral face images filtered with a bank of Gabor filters. The magnitude of filtered images is encoded with Simplified Weber Local Descriptor (SWLD) and Local Binary Pattern (LBP) operators. The phase is encoded with Generalized Local Binary Pattern (GLBP) operator. Encoded multi-spectral images are mapped into a histogram representation and cross matched by applying symmetric Kullback-Leibler distance. Performance of the developed algorithm is demonstrated on TINDERS database that contains long range SWIR and color images acquired at a distance of 2, 50, and 106 meters.;Apart from long acquisition range, other variations and distortions such as pose variation, motion and out of focus blur, and uneven illumination may be observed in multispectral face images. Recognition performance of the face recognition matcher can be greatly affected by these distortions. It is important, therefore, to ensure that matching is performed on high quality images. Poor quality images have to be either enhanced or discarded. This dissertation addresses the problem of selecting good quality samples.;The last chapters of the dissertation suggest a number of modifications applied to the cross spectral matching algorithm for matching low resolution color images in near-real time. We show that the method that encodes the magnitude of Gabor filtered images with the SWLD operator guarantees high recognition rates. The modified method (Gabor-SWLD) is adopted in a camera network set up where cameras acquire several views of the same individual. The designed algorithm and software are fully automated and optimized to perform recognition in near-real time. We evaluate the recognition performance and the processing time of the method on a small dataset collected at WVU
Human gait recognition under neutral and non-neutral gait sequences
Rapid advances in biometrics technology makes their use for person‘s identity more acceptable in a variety of applications, especially in the areas of the interest in security and surveillance. The upsurge in terrorist attacks in the past few years has focused research on biometric systems that have the ability to identify individuals from a distance, and this is spearheading research interest in Gait biometric due to being unobtrusive and less dependent on high image/video quality. Gait biometric is a behavioral trait that aims to identify individuals from image sequences based on their walking style. The growing list of possible civil as well as security applications for various purposes is paralleled by the emergence of a variety of research challenges in dealing with a various external as well as internal factors influencing the performance of Gait Recognition (GR) in unconstrained recording conditions.
This thesis is concerned with Gait Recognition in unconstrained scenarios aims to address research questions covering (1) The selection of sets of features for a gait signature; (2) The effects of gender and/or recoding condition case (neutral, carrying a bag, coat wearing) on the performance of GR schemes; (3) Integrating gender and/or case classifications into GR; and (4) The role of emerging Kinect sensor technology, with its capability of sensing human skeletal features in GR and applications. Accordingly, our objectives will focus on investigating, developing and testing the performance of using a variety of gait sequencefeatures for the various components/tasks and their integration. Our tests are based on large number of experiments based on CASIA B database as well as an in-house database of Kinect sensor recording. In all experiments, we use different dimension reduction and feature selection methods do reduce the dimensions in these proposed feature vectors, such as Principal Component Analysis (PCA), Linear Discriminant Analysis (LDA) and Fisher Score, followed by different classification methods like; k-nearest-neighbour (k-NN), Support Vector Machine (SVM), Naive Bayes and linear discriminant classifier (LDC), to test the performance of the proposed methods.
The initial part is focused on reviewing existing background removal for indoor and outdoor scenarios and developing more efficient versions primarily by adopting the work for wavelet domain rather than the traditional spatial domain based schemes. These include motion detection by frame differencing and Mixture of Gaussians, the latter being more reliable for outdoor scenarios. Subsequently, we investigated a variety of features that can be extractedfrom various subbands of wavelet-decomposed frames of different body parts (partitioned according to the golden ratio). We gradually built sets of features, together with their fused combinations, that can categorized as hybrid of model-based and motion-based models. The first list of features developed to deal with Neutral Gait Recognition (NGR) includes: Spatio-Temporal Model (STM), Legs Motion Detection Feature (LMD), and the Statistical model of the approximation LL-wavelet subband images (AWM). We shall demonstrate that fusing these features achieves accuracy of 97%, which is comparable to the state of the art. These features will be shown to achieve 96% accuracy in gender classification (GC), and we shall establish that the NGR2 scheme that integrates GC into NGR improves the accuracy by a noticeable percentage.
Testing the performance of these NGR schemes in recognising non-neutral cases revealed the challenges of Unrestricted Gait Recognition (UGR). The second part of the thesis is focused on developing UGR schemes.
For this, first a new statistical wavelet feature set extracted from high frequency subbands, called Detail coefficients Wavelet Model (DWM) was added to the previous list. Using different combinations of these schemes, will be shown to significantly improve the performance for non-neutral gait cases, but to less extent in the coat wearing case. We then develop a Gait Sequence Case Detection (GSCD) which has excellent performance. We will show that integrating GSCD and GC together into UGR improves the performance for all cases. We shall also investigate the different UGS scheme that generalizes existing work on Gait Energy and Gait Entropy images (GEI and GEnI) features but in the wavelet domain and in different body parts. Testing these two schemes, and their fusion, post the PCA dimension reduction yield much improved accuracy for the non-neutral cases compared to existing scheme GEI and GEnI schemes, but are significantly outperformed by the last scheme. However, by fusing the UGS scheme with the GSCD+GC+UGR scheme above we will get best accuracy that outperform the state of the art in GR specially in the non-neutral cases.
The thesis ended by conducting a rather limited investigation on the use of the Kinect sensors for GR. We develop two sets of features: Horizontal Distance Features and Vertical Distance Features from small set of skeleton point trajectories. The experimental result on neutral was very successful but for the unrestricted gait recognition (with the 5 case variations) satisfactory but not optimal performance relies on the gallery including balanced number of samples from all cases
QUIS-CAMPI: Biometric Recognition in Surveillance Scenarios
The concerns about individuals security have justified the increasing number of surveillance
cameras deployed both in private and public spaces. However, contrary to popular belief,
these devices are in most cases used solely for recording, instead of feeding intelligent analysis
processes capable of extracting information about the observed individuals. Thus, even though
video surveillance has already proved to be essential for solving multiple crimes, obtaining relevant
details about the subjects that took part in a crime depends on the manual inspection
of recordings. As such, the current goal of the research community is the development of
automated surveillance systems capable of monitoring and identifying subjects in surveillance
scenarios. Accordingly, the main goal of this thesis is to improve the performance of biometric
recognition algorithms in data acquired from surveillance scenarios. In particular, we aim at
designing a visual surveillance system capable of acquiring biometric data at a distance (e.g.,
face, iris or gait) without requiring human intervention in the process, as well as devising biometric
recognition methods robust to the degradation factors resulting from the unconstrained
acquisition process.
Regarding the first goal, the analysis of the data acquired by typical surveillance systems
shows that large acquisition distances significantly decrease the resolution of biometric samples,
and thus their discriminability is not sufficient for recognition purposes. In the literature,
diverse works point out Pan Tilt Zoom (PTZ) cameras as the most practical way for acquiring
high-resolution imagery at a distance, particularly when using a master-slave configuration. In
the master-slave configuration, the video acquired by a typical surveillance camera is analyzed
for obtaining regions of interest (e.g., car, person) and these regions are subsequently imaged
at high-resolution by the PTZ camera. Several methods have already shown that this configuration
can be used for acquiring biometric data at a distance. Nevertheless, these methods
failed at providing effective solutions to the typical challenges of this strategy, restraining its
use in surveillance scenarios. Accordingly, this thesis proposes two methods to support the development
of a biometric data acquisition system based on the cooperation of a PTZ camera
with a typical surveillance camera. The first proposal is a camera calibration method capable
of accurately mapping the coordinates of the master camera to the pan/tilt angles of the PTZ
camera. The second proposal is a camera scheduling method for determining - in real-time -
the sequence of acquisitions that maximizes the number of different targets obtained, while
minimizing the cumulative transition time. In order to achieve the first goal of this thesis,
both methods were combined with state-of-the-art approaches of the human monitoring field
to develop a fully automated surveillance capable of acquiring biometric data at a distance and
without human cooperation, designated as QUIS-CAMPI system.
The QUIS-CAMPI system is the basis for pursuing the second goal of this thesis. The analysis
of the performance of the state-of-the-art biometric recognition approaches shows that these
approaches attain almost ideal recognition rates in unconstrained data. However, this performance
is incongruous with the recognition rates observed in surveillance scenarios. Taking into
account the drawbacks of current biometric datasets, this thesis introduces a novel dataset comprising
biometric samples (face images and gait videos) acquired by the QUIS-CAMPI system at a
distance ranging from 5 to 40 meters and without human intervention in the acquisition process.
This set allows to objectively assess the performance of state-of-the-art biometric recognition
methods in data that truly encompass the covariates of surveillance scenarios. As such, this set
was exploited for promoting the first international challenge on biometric recognition in the wild. This thesis describes the evaluation protocols adopted, along with the results obtained
by the nine methods specially designed for this competition. In addition, the data acquired by
the QUIS-CAMPI system were crucial for accomplishing the second goal of this thesis, i.e., the
development of methods robust to the covariates of surveillance scenarios. The first proposal
regards a method for detecting corrupted features in biometric signatures inferred by a redundancy
analysis algorithm. The second proposal is a caricature-based face recognition approach
capable of enhancing the recognition performance by automatically generating a caricature
from a 2D photo. The experimental evaluation of these methods shows that both approaches
contribute to improve the recognition performance in unconstrained data.A crescente preocupação com a segurança dos indivíduos tem justificado o crescimento
do número de câmaras de vídeo-vigilância instaladas tanto em espaços privados como públicos.
Contudo, ao contrário do que normalmente se pensa, estes dispositivos são, na maior parte dos
casos, usados apenas para gravação, não estando ligados a nenhum tipo de software inteligente
capaz de inferir em tempo real informações sobre os indivíduos observados. Assim, apesar de a
vídeo-vigilância ter provado ser essencial na resolução de diversos crimes, o seu uso está ainda
confinado à disponibilização de vídeos que têm que ser manualmente inspecionados para extrair
informações relevantes dos sujeitos envolvidos no crime. Como tal, atualmente, o principal
desafio da comunidade científica é o desenvolvimento de sistemas automatizados capazes de
monitorizar e identificar indivíduos em ambientes de vídeo-vigilância.
Esta tese tem como principal objetivo estender a aplicabilidade dos sistemas de reconhecimento
biométrico aos ambientes de vídeo-vigilância. De forma mais especifica, pretende-se
1) conceber um sistema de vídeo-vigilância que consiga adquirir dados biométricos a longas distâncias
(e.g., imagens da cara, íris, ou vídeos do tipo de passo) sem requerer a cooperação dos
indivíduos no processo; e 2) desenvolver métodos de reconhecimento biométrico robustos aos
fatores de degradação inerentes aos dados adquiridos por este tipo de sistemas.
No que diz respeito ao primeiro objetivo, a análise aos dados adquiridos pelos sistemas típicos
de vídeo-vigilância mostra que, devido à distância de captura, os traços biométricos amostrados
não são suficientemente discriminativos para garantir taxas de reconhecimento aceitáveis.
Na literatura, vários trabalhos advogam o uso de câmaras Pan Tilt Zoom (PTZ) para adquirir
imagens de alta resolução à distância, principalmente o uso destes dispositivos no modo masterslave.
Na configuração master-slave um módulo de análise inteligente seleciona zonas de interesse
(e.g. carros, pessoas) a partir do vídeo adquirido por uma câmara de vídeo-vigilância
e a câmara PTZ é orientada para adquirir em alta resolução as regiões de interesse. Diversos
métodos já mostraram que esta configuração pode ser usada para adquirir dados biométricos
à distância, ainda assim estes não foram capazes de solucionar alguns problemas relacionados
com esta estratégia, impedindo assim o seu uso em ambientes de vídeo-vigilância. Deste modo,
esta tese propõe dois métodos para permitir a aquisição de dados biométricos em ambientes de
vídeo-vigilância usando uma câmara PTZ assistida por uma câmara típica de vídeo-vigilância. O
primeiro é um método de calibração capaz de mapear de forma exata as coordenadas da câmara
master para o ângulo da câmara PTZ (slave) sem o auxílio de outros dispositivos óticos. O
segundo método determina a ordem pela qual um conjunto de sujeitos vai ser observado pela
câmara PTZ. O método proposto consegue determinar em tempo-real a sequência de observações
que maximiza o número de diferentes sujeitos observados e simultaneamente minimiza o
tempo total de transição entre sujeitos. De modo a atingir o primeiro objetivo desta tese, os
dois métodos propostos foram combinados com os avanços alcançados na área da monitorização
de humanos para assim desenvolver o primeiro sistema de vídeo-vigilância completamente automatizado
e capaz de adquirir dados biométricos a longas distâncias sem requerer a cooperação
dos indivíduos no processo, designado por sistema QUIS-CAMPI.
O sistema QUIS-CAMPI representa o ponto de partida para iniciar a investigação relacionada
com o segundo objetivo desta tese. A análise do desempenho dos métodos de reconhecimento
biométrico do estado-da-arte mostra que estes conseguem obter taxas de reconhecimento
quase perfeitas em dados adquiridos sem restrições (e.g., taxas de reconhecimento
maiores do que 99% no conjunto de dados LFW). Contudo, este desempenho não é corroborado pelos resultados observados em ambientes de vídeo-vigilância, o que sugere que os conjuntos
de dados atuais não contêm verdadeiramente os fatores de degradação típicos dos ambientes de
vídeo-vigilância. Tendo em conta as vulnerabilidades dos conjuntos de dados biométricos atuais,
esta tese introduz um novo conjunto de dados biométricos (imagens da face e vídeos do tipo de
passo) adquiridos pelo sistema QUIS-CAMPI a uma distância máxima de 40m e sem a cooperação
dos sujeitos no processo de aquisição. Este conjunto permite avaliar de forma objetiva o desempenho
dos métodos do estado-da-arte no reconhecimento de indivíduos em imagens/vídeos
capturados num ambiente real de vídeo-vigilância. Como tal, este conjunto foi utilizado para
promover a primeira competição de reconhecimento biométrico em ambientes não controlados.
Esta tese descreve os protocolos de avaliação usados, assim como os resultados obtidos por 9
métodos especialmente desenhados para esta competição. Para além disso, os dados adquiridos
pelo sistema QUIS-CAMPI foram essenciais para o desenvolvimento de dois métodos para
aumentar a robustez aos fatores de degradação observados em ambientes de vídeo-vigilância. O
primeiro é um método para detetar características corruptas em assinaturas biométricas através
da análise da redundância entre subconjuntos de características. O segundo é um método de
reconhecimento facial baseado em caricaturas automaticamente geradas a partir de uma única
foto do sujeito. As experiências realizadas mostram que ambos os métodos conseguem reduzir
as taxas de erro em dados adquiridos de forma não controlada
Estimation de cartes d'énergie du bruit apériodique de la marche humaine avec une caméra de profondeur pour la détection de pathologies et modèles légers de détection d'objets saillants basés sur l'opposition de couleurs
Cette thèse a pour objectif l’étude de trois problèmes : l’estimation de cartes de saillance de l’énergie du bruit apériodique de la marche humaine par la perception de profondeur pour la détection de pathologies, les modèles de détection d’objets saillants en général et les modèles légers en particulier par l’opposition de couleurs.
Comme première contribution, nous proposons un système basé sur une caméra de profondeur et un tapis roulant, qui analyse les parties du corps du patient ayant un mouvement irrégulier, en termes de périodicité, pendant la marche. Nous supposons que la marche d'un sujet sain présente n'importe où dans son corps, pendant les cycles de marche, un signal de profondeur avec un motif périodique sans bruit. La présence de bruit et son importance peuvent être utilisées pour signaler la présence et l'étendue de pathologies chez le sujet. Notre système estime, à partir de chaque séquence vidéo, une carte couleur de saillance montrant les zones de fortes irrégularités de marche, en termes de périodicité, appelées énergie de bruit apériodique, de chaque sujet. Notre système permet aussi de détecter automatiquement les cartes des individus sains et ceux malades.
Nous présentons ensuite deux approches pour la détection d’objets saillants. Bien qu’ayant fait l’objet de plusieurs travaux de recherche, la détection d'objets saillants reste un défi. La plupart des modèles traitent la couleur et la texture séparément et les considèrent donc implicitement comme des caractéristiques indépendantes, à tort.
Comme deuxième contribution, nous proposons une nouvelle stratégie, à travers un modèle simple, presque sans paramètres internes, générant une carte de saillance robuste pour une image naturelle. Cette stratégie consiste à intégrer la couleur dans les motifs de texture pour caractériser une micro-texture colorée, ceci grâce au motif ternaire local (LTP) (descripteur de texture simple mais puissant) appliqué aux paires de couleurs. La dissemblance entre chaque paire de micro-textures colorées est calculée en tenant compte de la non-linéarité des micro-textures colorées et en préservant leurs distances, donnant une carte de saillance intermédiaire pour chaque espace de couleur. La carte de saillance finale est leur combinaison pour avoir des cartes robustes.
Le développement des réseaux de neurones profonds a récemment permis des performances élevées. Cependant, il reste un défi de développer des modèles de même performance pour des appareils avec des ressources limitées.
Comme troisième contribution, nous proposons une nouvelle approche pour un modèle léger de réseau neuronal profond de détection d'objets saillants, inspiré par les processus de double opposition du cortex visuel primaire, qui lient inextricablement la couleur et la forme dans la perception humaine des couleurs. Notre modèle proposé, CoSOV1net, est entraîné à partir de zéro, sans utiliser de ``backbones'' de classification d'images ou d'autres tâches. Les expériences sur les ensembles de données les plus utilisés et les plus complexes pour la détection d'objets saillants montrent que CoSOV1Net atteint des performances compétitives avec des modèles de l’état-de-l’art, tout en étant un modèle léger de détection d'objets saillants et pouvant être adapté aux environnements mobiles et aux appareils à ressources limitées.The purpose of this thesis is to study three problems: the estimation of saliency maps of the aperiodic noise energy of human gait using depth perception for pathology detection, and to study models for salient objects detection in general and lightweight models in particular by color opposition.
As our first contribution, we propose a system based on a depth camera and a treadmill, which analyzes the parts of the patient's body with irregular movement, in terms of periodicity, during walking. We assume that a healthy subject gait presents anywhere in his (her) body, during gait cycles, a depth signal with a periodic pattern without noise. The presence of noise and its importance can be used to point out presence and extent of the subject’s pathologies. Our system estimates, from each video sequence, a saliency map showing the areas of strong gait irregularities, in terms of periodicity, called aperiodic noise energy, of each subject. Our system also makes it possible to automatically detect the saliency map of healthy and sick subjects.
We then present two approaches for salient objects detection. Although having been the subject of many research works, salient objects detection remains a challenge. Most models treat color and texture separately and therefore implicitly consider them as independent feature, erroneously.
As a second contribution, we propose a new strategy through a simple model, almost without internal parameters, generating a robust saliency map for a natural image. This strategy consists in integrating color in texture patterns to characterize a colored micro-texture thanks to the local ternary pattern (LTP) (simple but powerful texture descriptor) applied to the color pairs. The dissimilarity between each colored micro-textures pair is computed considering non-linearity from colored micro-textures and preserving their distances. This gives an intermediate saliency map for each color space. The final saliency map is their combination to have robust saliency map.
The development of deep neural networks has recently enabled high performance. However, it remains a challenge to develop models of the same performance for devices with limited resources.
As a third contribution, we propose a new approach for a lightweight salient objects detection deep neural network model, inspired by the double opponent process in the primary visual cortex, which inextricably links color and shape in human color perception. Our proposed model, namely CoSOV1net, is trained from scratch, without using any image classification backbones or other tasks. Experiments on the most used and challenging datasets for salient objects detection show that CoSOV1Net achieves competitive performance with state-of-the-art models, yet it is a lightweight detection model and it is a salient objects detection that can be adapted to mobile environments and resource-constrained devices
Human face detection techniques: A comprehensive review and future research directions
Face detection which is an effortless task for humans are complex to perform on machines. Recent veer proliferation of computational resources are paving the way for a frantic advancement of face detection technology. Many astutely developed algorithms have been proposed to detect faces. However, there is a little heed paid in making a comprehensive survey of the available algorithms. This paper aims at providing fourfold discussions on face detection algorithms. At first, we explore a wide variety of available face detection algorithms in five steps including history, working procedure, advantages, limitations, and use in other fields alongside face detection. Secondly, we include a comparative evaluation among different algorithms in each single method. Thirdly, we provide detailed comparisons among the algorithms epitomized to have an all inclusive outlook. Lastly, we conclude this study with several promising research directions to pursue. Earlier survey papers on face detection algorithms are limited to just technical details and popularly used algorithms. In our study, however, we cover detailed technical explanations of face detection algorithms and various recent sub-branches of neural network. We present detailed comparisons among the algorithms in all-inclusive and also under sub-branches. We provide strengths and limitations of these algorithms and a novel literature survey including their use besides face detection
- …