5,740 research outputs found

    Machine Understanding of Human Behavior

    Get PDF
    A widely accepted prediction is that computing will move to the background, weaving itself into the fabric of our everyday living spaces and projecting the human user into the foreground. If this prediction is to come true, then next generation computing, which we will call human computing, should be about anticipatory user interfaces that should be human-centered, built for humans based on human models. They should transcend the traditional keyboard and mouse to include natural, human-like interactive functions including understanding and emulating certain human behaviors such as affective and social signaling. This article discusses a number of components of human behavior, how they might be integrated into computers, and how far we are from realizing the front end of human computing, that is, how far are we from enabling computers to understand human behavior

    Learning to Find Eye Region Landmarks for Remote Gaze Estimation in Unconstrained Settings

    Full text link
    Conventional feature-based and model-based gaze estimation methods have proven to perform well in settings with controlled illumination and specialized cameras. In unconstrained real-world settings, however, such methods are surpassed by recent appearance-based methods due to difficulties in modeling factors such as illumination changes and other visual artifacts. We present a novel learning-based method for eye region landmark localization that enables conventional methods to be competitive to latest appearance-based methods. Despite having been trained exclusively on synthetic data, our method exceeds the state of the art for iris localization and eye shape registration on real-world imagery. We then use the detected landmarks as input to iterative model-fitting and lightweight learning-based gaze estimation methods. Our approach outperforms existing model-fitting and appearance-based methods in the context of person-independent and personalized gaze estimation

    QUIS-CAMPI: Biometric Recognition in Surveillance Scenarios

    Get PDF
    The concerns about individuals security have justified the increasing number of surveillance cameras deployed both in private and public spaces. However, contrary to popular belief, these devices are in most cases used solely for recording, instead of feeding intelligent analysis processes capable of extracting information about the observed individuals. Thus, even though video surveillance has already proved to be essential for solving multiple crimes, obtaining relevant details about the subjects that took part in a crime depends on the manual inspection of recordings. As such, the current goal of the research community is the development of automated surveillance systems capable of monitoring and identifying subjects in surveillance scenarios. Accordingly, the main goal of this thesis is to improve the performance of biometric recognition algorithms in data acquired from surveillance scenarios. In particular, we aim at designing a visual surveillance system capable of acquiring biometric data at a distance (e.g., face, iris or gait) without requiring human intervention in the process, as well as devising biometric recognition methods robust to the degradation factors resulting from the unconstrained acquisition process. Regarding the first goal, the analysis of the data acquired by typical surveillance systems shows that large acquisition distances significantly decrease the resolution of biometric samples, and thus their discriminability is not sufficient for recognition purposes. In the literature, diverse works point out Pan Tilt Zoom (PTZ) cameras as the most practical way for acquiring high-resolution imagery at a distance, particularly when using a master-slave configuration. In the master-slave configuration, the video acquired by a typical surveillance camera is analyzed for obtaining regions of interest (e.g., car, person) and these regions are subsequently imaged at high-resolution by the PTZ camera. Several methods have already shown that this configuration can be used for acquiring biometric data at a distance. Nevertheless, these methods failed at providing effective solutions to the typical challenges of this strategy, restraining its use in surveillance scenarios. Accordingly, this thesis proposes two methods to support the development of a biometric data acquisition system based on the cooperation of a PTZ camera with a typical surveillance camera. The first proposal is a camera calibration method capable of accurately mapping the coordinates of the master camera to the pan/tilt angles of the PTZ camera. The second proposal is a camera scheduling method for determining - in real-time - the sequence of acquisitions that maximizes the number of different targets obtained, while minimizing the cumulative transition time. In order to achieve the first goal of this thesis, both methods were combined with state-of-the-art approaches of the human monitoring field to develop a fully automated surveillance capable of acquiring biometric data at a distance and without human cooperation, designated as QUIS-CAMPI system. The QUIS-CAMPI system is the basis for pursuing the second goal of this thesis. The analysis of the performance of the state-of-the-art biometric recognition approaches shows that these approaches attain almost ideal recognition rates in unconstrained data. However, this performance is incongruous with the recognition rates observed in surveillance scenarios. Taking into account the drawbacks of current biometric datasets, this thesis introduces a novel dataset comprising biometric samples (face images and gait videos) acquired by the QUIS-CAMPI system at a distance ranging from 5 to 40 meters and without human intervention in the acquisition process. This set allows to objectively assess the performance of state-of-the-art biometric recognition methods in data that truly encompass the covariates of surveillance scenarios. As such, this set was exploited for promoting the first international challenge on biometric recognition in the wild. This thesis describes the evaluation protocols adopted, along with the results obtained by the nine methods specially designed for this competition. In addition, the data acquired by the QUIS-CAMPI system were crucial for accomplishing the second goal of this thesis, i.e., the development of methods robust to the covariates of surveillance scenarios. The first proposal regards a method for detecting corrupted features in biometric signatures inferred by a redundancy analysis algorithm. The second proposal is a caricature-based face recognition approach capable of enhancing the recognition performance by automatically generating a caricature from a 2D photo. The experimental evaluation of these methods shows that both approaches contribute to improve the recognition performance in unconstrained data.A crescente preocupação com a segurança dos indivĂ­duos tem justificado o crescimento do nĂșmero de cĂąmaras de vĂ­deo-vigilĂąncia instaladas tanto em espaços privados como pĂșblicos. Contudo, ao contrĂĄrio do que normalmente se pensa, estes dispositivos sĂŁo, na maior parte dos casos, usados apenas para gravação, nĂŁo estando ligados a nenhum tipo de software inteligente capaz de inferir em tempo real informaçÔes sobre os indivĂ­duos observados. Assim, apesar de a vĂ­deo-vigilĂąncia ter provado ser essencial na resolução de diversos crimes, o seu uso estĂĄ ainda confinado Ă  disponibilização de vĂ­deos que tĂȘm que ser manualmente inspecionados para extrair informaçÔes relevantes dos sujeitos envolvidos no crime. Como tal, atualmente, o principal desafio da comunidade cientĂ­fica Ă© o desenvolvimento de sistemas automatizados capazes de monitorizar e identificar indivĂ­duos em ambientes de vĂ­deo-vigilĂąncia. Esta tese tem como principal objetivo estender a aplicabilidade dos sistemas de reconhecimento biomĂ©trico aos ambientes de vĂ­deo-vigilĂąncia. De forma mais especifica, pretende-se 1) conceber um sistema de vĂ­deo-vigilĂąncia que consiga adquirir dados biomĂ©tricos a longas distĂąncias (e.g., imagens da cara, Ă­ris, ou vĂ­deos do tipo de passo) sem requerer a cooperação dos indivĂ­duos no processo; e 2) desenvolver mĂ©todos de reconhecimento biomĂ©trico robustos aos fatores de degradação inerentes aos dados adquiridos por este tipo de sistemas. No que diz respeito ao primeiro objetivo, a anĂĄlise aos dados adquiridos pelos sistemas tĂ­picos de vĂ­deo-vigilĂąncia mostra que, devido Ă  distĂąncia de captura, os traços biomĂ©tricos amostrados nĂŁo sĂŁo suficientemente discriminativos para garantir taxas de reconhecimento aceitĂĄveis. Na literatura, vĂĄrios trabalhos advogam o uso de cĂąmaras Pan Tilt Zoom (PTZ) para adquirir imagens de alta resolução Ă  distĂąncia, principalmente o uso destes dispositivos no modo masterslave. Na configuração master-slave um mĂłdulo de anĂĄlise inteligente seleciona zonas de interesse (e.g. carros, pessoas) a partir do vĂ­deo adquirido por uma cĂąmara de vĂ­deo-vigilĂąncia e a cĂąmara PTZ Ă© orientada para adquirir em alta resolução as regiĂ”es de interesse. Diversos mĂ©todos jĂĄ mostraram que esta configuração pode ser usada para adquirir dados biomĂ©tricos Ă  distĂąncia, ainda assim estes nĂŁo foram capazes de solucionar alguns problemas relacionados com esta estratĂ©gia, impedindo assim o seu uso em ambientes de vĂ­deo-vigilĂąncia. Deste modo, esta tese propĂ”e dois mĂ©todos para permitir a aquisição de dados biomĂ©tricos em ambientes de vĂ­deo-vigilĂąncia usando uma cĂąmara PTZ assistida por uma cĂąmara tĂ­pica de vĂ­deo-vigilĂąncia. O primeiro Ă© um mĂ©todo de calibração capaz de mapear de forma exata as coordenadas da cĂąmara master para o Ăąngulo da cĂąmara PTZ (slave) sem o auxĂ­lio de outros dispositivos Ăłticos. O segundo mĂ©todo determina a ordem pela qual um conjunto de sujeitos vai ser observado pela cĂąmara PTZ. O mĂ©todo proposto consegue determinar em tempo-real a sequĂȘncia de observaçÔes que maximiza o nĂșmero de diferentes sujeitos observados e simultaneamente minimiza o tempo total de transição entre sujeitos. De modo a atingir o primeiro objetivo desta tese, os dois mĂ©todos propostos foram combinados com os avanços alcançados na ĂĄrea da monitorização de humanos para assim desenvolver o primeiro sistema de vĂ­deo-vigilĂąncia completamente automatizado e capaz de adquirir dados biomĂ©tricos a longas distĂąncias sem requerer a cooperação dos indivĂ­duos no processo, designado por sistema QUIS-CAMPI. O sistema QUIS-CAMPI representa o ponto de partida para iniciar a investigação relacionada com o segundo objetivo desta tese. A anĂĄlise do desempenho dos mĂ©todos de reconhecimento biomĂ©trico do estado-da-arte mostra que estes conseguem obter taxas de reconhecimento quase perfeitas em dados adquiridos sem restriçÔes (e.g., taxas de reconhecimento maiores do que 99% no conjunto de dados LFW). Contudo, este desempenho nĂŁo Ă© corroborado pelos resultados observados em ambientes de vĂ­deo-vigilĂąncia, o que sugere que os conjuntos de dados atuais nĂŁo contĂȘm verdadeiramente os fatores de degradação tĂ­picos dos ambientes de vĂ­deo-vigilĂąncia. Tendo em conta as vulnerabilidades dos conjuntos de dados biomĂ©tricos atuais, esta tese introduz um novo conjunto de dados biomĂ©tricos (imagens da face e vĂ­deos do tipo de passo) adquiridos pelo sistema QUIS-CAMPI a uma distĂąncia mĂĄxima de 40m e sem a cooperação dos sujeitos no processo de aquisição. Este conjunto permite avaliar de forma objetiva o desempenho dos mĂ©todos do estado-da-arte no reconhecimento de indivĂ­duos em imagens/vĂ­deos capturados num ambiente real de vĂ­deo-vigilĂąncia. Como tal, este conjunto foi utilizado para promover a primeira competição de reconhecimento biomĂ©trico em ambientes nĂŁo controlados. Esta tese descreve os protocolos de avaliação usados, assim como os resultados obtidos por 9 mĂ©todos especialmente desenhados para esta competição. Para alĂ©m disso, os dados adquiridos pelo sistema QUIS-CAMPI foram essenciais para o desenvolvimento de dois mĂ©todos para aumentar a robustez aos fatores de degradação observados em ambientes de vĂ­deo-vigilĂąncia. O primeiro Ă© um mĂ©todo para detetar caracterĂ­sticas corruptas em assinaturas biomĂ©tricas atravĂ©s da anĂĄlise da redundĂąncia entre subconjuntos de caracterĂ­sticas. O segundo Ă© um mĂ©todo de reconhecimento facial baseado em caricaturas automaticamente geradas a partir de uma Ășnica foto do sujeito. As experiĂȘncias realizadas mostram que ambos os mĂ©todos conseguem reduzir as taxas de erro em dados adquiridos de forma nĂŁo controlada

    On using gait to enhance frontal face extraction

    No full text
    Visual surveillance finds increasing deployment formonitoring urban environments. Operators need to be able to determine identity from surveillance images and often use face recognition for this purpose. In surveillance environments, it is necessary to handle pose variation of the human head, low frame rate, and low resolution input images. We describe the first use of gait to enable face acquisition and recognition, by analysis of 3-D head motion and gait trajectory, with super-resolution analysis. We use region- and distance-based refinement of head pose estimation. We develop a direct mapping to relate the 2-D image with a 3-D model. In gait trajectory analysis, we model the looming effect so as to obtain the correct face region. Based on head position and the gait trajectory, we can reconstruct high-quality frontal face images which are demonstrated to be suitable for face recognition. The contributions of this research include the construction of a 3-D model for pose estimation from planar imagery and the first use of gait information to enhance the face extraction process allowing for deployment in surveillance scenario

    Improved Hand-Tracking Framework with a Recovery Mechanism

    Get PDF
    Abstract−Hand-tracking is fundamental to translating sign language to a spoken language. Accurate and reliable sign language translation depends on effective and accurate hand-tracking. This paper proposes an improved hand-tracking framework that includes a tracking recovery algorithm optimising a previous framework to better handle occlusion. It integrates the tracking recovery algorithm to improve the discrimination between hands and the tracking of hands. The framework was evaluated on 30 South African Sign Language phrases that use: a single hand; both hands without occlusion; and both hands with occlusion. Ten individuals in constrained and unconstrained environments performed the gestures. Overall, the proposed framework achieved an average success rate of 91.8% compared to an average success rate of 81.1% using the previous framework. The results show an improved tracking accuracy across all signs in constrained and unconstrained environments

    A Comprehensive Performance Evaluation of Deformable Face Tracking "In-the-Wild"

    Full text link
    Recently, technologies such as face detection, facial landmark localisation and face recognition and verification have matured enough to provide effective and efficient solutions for imagery captured under arbitrary conditions (referred to as "in-the-wild"). This is partially attributed to the fact that comprehensive "in-the-wild" benchmarks have been developed for face detection, landmark localisation and recognition/verification. A very important technology that has not been thoroughly evaluated yet is deformable face tracking "in-the-wild". Until now, the performance has mainly been assessed qualitatively by visually assessing the result of a deformable face tracking technology on short videos. In this paper, we perform the first, to the best of our knowledge, thorough evaluation of state-of-the-art deformable face tracking pipelines using the recently introduced 300VW benchmark. We evaluate many different architectures focusing mainly on the task of on-line deformable face tracking. In particular, we compare the following general strategies: (a) generic face detection plus generic facial landmark localisation, (b) generic model free tracking plus generic facial landmark localisation, as well as (c) hybrid approaches using state-of-the-art face detection, model free tracking and facial landmark localisation technologies. Our evaluation reveals future avenues for further research on the topic.Comment: E. Antonakos and P. Snape contributed equally and have joint second authorshi

    Deep Neural Network and Data Augmentation Methodology for off-axis iris segmentation in wearable headsets

    Full text link
    A data augmentation methodology is presented and applied to generate a large dataset of off-axis iris regions and train a low-complexity deep neural network. Although of low complexity the resulting network achieves a high level of accuracy in iris region segmentation for challenging off-axis eye-patches. Interestingly, this network is also shown to achieve high levels of performance for regular, frontal, segmentation of iris regions, comparing favorably with state-of-the-art techniques of significantly higher complexity. Due to its lower complexity, this network is well suited for deployment in embedded applications such as augmented and mixed reality headsets

    Single camera pose estimation using Bayesian filtering and Kinect motion priors

    Full text link
    Traditional approaches to upper body pose estimation using monocular vision rely on complex body models and a large variety of geometric constraints. We argue that this is not ideal and somewhat inelegant as it results in large processing burdens, and instead attempt to incorporate these constraints through priors obtained directly from training data. A prior distribution covering the probability of a human pose occurring is used to incorporate likely human poses. This distribution is obtained offline, by fitting a Gaussian mixture model to a large dataset of recorded human body poses, tracked using a Kinect sensor. We combine this prior information with a random walk transition model to obtain an upper body model, suitable for use within a recursive Bayesian filtering framework. Our model can be viewed as a mixture of discrete Ornstein-Uhlenbeck processes, in that states behave as random walks, but drift towards a set of typically observed poses. This model is combined with measurements of the human head and hand positions, using recursive Bayesian estimation to incorporate temporal information. Measurements are obtained using face detection and a simple skin colour hand detector, trained using the detected face. The suggested model is designed with analytical tractability in mind and we show that the pose tracking can be Rao-Blackwellised using the mixture Kalman filter, allowing for computational efficiency while still incorporating bio-mechanical properties of the upper body. In addition, the use of the proposed upper body model allows reliable three-dimensional pose estimates to be obtained indirectly for a number of joints that are often difficult to detect using traditional object recognition strategies. Comparisons with Kinect sensor results and the state of the art in 2D pose estimation highlight the efficacy of the proposed approach.Comment: 25 pages, Technical report, related to Burke and Lasenby, AMDO 2014 conference paper. Code sample: https://github.com/mgb45/SignerBodyPose Video: https://www.youtube.com/watch?v=dJMTSo7-uF
    • 

    corecore