11 research outputs found

    Pose-Invariant Face Recognition via RGB-D Images

    Get PDF
    Three-dimensional (3D) face models can intrinsically handle large pose face recognition problem. In this paper, we propose a novel pose-invariant face recognition method via RGB-D images. By employing depth, our method is able to handle self-occlusion and deformation, both of which are challenging problems in two-dimensional (2D) face recognition. Texture images in the gallery can be rendered to the same view as the probe via depth. Meanwhile, depth is also used for similarity measure via frontalization and symmetric filling. Finally, both texture and depth contribute to the final identity estimation. Experiments on Bosphorus, CurtinFaces, Eurecom, and Kiwi databases demonstrate that the additional depth information has improved the performance of face recognition with large pose variations and under even more challenging conditions

    Directed Gaze Trajectories for Biometric Presentation Attack Detection

    Get PDF
    Presentation attack artefacts can be used to subvert the operation of biometric systems by being presented to the sensors of such systems. In this work, we propose the use of visual stimuli with randomised trajectories to stimulate eye movements for the detection of such spoofing attacks. The presentation of a moving visual challenge is used to ensure that some pupillary motion is stimulated and then captured with a camera. Various types of challenge trajectories are explored on different planar geometries representing prospective devices where the challenge could be presented to users. To evaluate the system, photo, 2D mask and 3D mask attack artefacts were used and pupillary movement data were captured from 80 volunteers performing genuine and spoofing attempts. The results support the potential of the proposed features for the detection of biometric presentation attacks

    Vision-Based 2D and 3D Human Activity Recognition

    Get PDF

    Skin texture features for face recognition

    Get PDF
    Face recognition has been deployed in a wide range of important applications including surveillance and forensic identification. However, it still seems to be a challenging problem as its performance severely degrades under illumination, pose and expression variations, as well as with occlusions, and aging. In this thesis, we have investigated the use of local facial skin data as a source of biometric information to improve human recognition. Skin texture features have been exploited in three major tasks, which include (i) improving the performance of conventional face recognition systems, (ii) building an adaptive skin-based face recognition system, and (iii) dealing with circumstances when a full view of the face may not be avai'lable. Additionally, a fully automated scheme is presented for localizing eyes and mouth and segmenting four facial regions: forehead, right cheek, left cheek and chin. These four regions are divided into nonoverlapping patches with equal size. A novel skin/non-skin classifier is proposed for detecting patches containing only skin texture and therefore detecting the pure-skin regions. Experiments using the XM2VTS database indicate that the forehead region has the most significant biometric information. The use of forehead texture features improves the rank-l identification of Eigenfaces system from 77.63% to 84.07%. The rank-l identification is equal 93.56% when this region is fused with Kernel Direct Discriminant Analysis algorithm

    Toward Understanding Human Expression in Human-Robot Interaction

    Get PDF
    Intelligent devices are quickly becoming necessities to support our activities during both work and play. We are already bound in a symbiotic relationship with these devices. An unfortunate effect of the pervasiveness of intelligent devices is the substantial investment of our time and effort to communicate intent. Even though our increasing reliance on these intelligent devices is inevitable, the limits of conventional methods for devices to perceive human expression hinders communication efficiency. These constraints restrict the usefulness of intelligent devices to support our activities. Our communication time and effort must be minimized to leverage the benefits of intelligent devices and seamlessly integrate them into society. Minimizing the time and effort needed to communicate our intent will allow us to concentrate on tasks in which we excel, including creative thought and problem solving. An intuitive method to minimize human communication effort with intelligent devices is to take advantage of our existing interpersonal communication experience. Recent advances in speech, hand gesture, and facial expression recognition provide alternate viable modes of communication that are more natural than conventional tactile interfaces. Use of natural human communication eliminates the need to adapt and invest time and effort using less intuitive techniques required for traditional keyboard and mouse based interfaces. Although the state of the art in natural but isolated modes of communication achieves impressive results, significant hurdles must be conquered before communication with devices in our daily lives will feel natural and effortless. Research has shown that combining information between multiple noise-prone modalities improves accuracy. Leveraging this complementary and redundant content will improve communication robustness and relax current unimodal limitations. This research presents and evaluates a novel multimodal framework to help reduce the total human effort and time required to communicate with intelligent devices. This reduction is realized by determining human intent using a knowledge-based architecture that combines and leverages conflicting information available across multiple natural communication modes and modalities. The effectiveness of this approach is demonstrated using dynamic hand gestures and simple facial expressions characterizing basic emotions. It is important to note that the framework is not restricted to these two forms of communication. The framework presented in this research provides the flexibility necessary to include additional or alternate modalities and channels of information in future research, including improving the robustness of speech understanding. The primary contributions of this research include the leveraging of conflicts in a closed-loop multimodal framework, explicit use of uncertainty in knowledge representation and reasoning across multiple modalities, and a flexible approach for leveraging domain specific knowledge to help understand multimodal human expression. Experiments using a manually defined knowledge base demonstrate an improved average accuracy of individual concepts and an improved average accuracy of overall intents when leveraging conflicts as compared to an open-loop approach

    Face recognition by means of advanced contributions in machine learning

    Get PDF
    Face recognition (FR) has been extensively studied, due to both scientific fundamental challenges and current and potential applications where human identification is needed. FR systems have the benefits of their non intrusiveness, low cost of equipments and no useragreement requirements when doing acquisition, among the most important ones. Nevertheless, despite the progress made in last years and the different solutions proposed, FR performance is not yet satisfactory when more demanding conditions are required (different viewpoints, blocked effects, illumination changes, strong lighting states, etc). Particularly, the effect of such non-controlled lighting conditions on face images leads to one of the strongest distortions in facial appearance. This dissertation addresses the problem of FR when dealing with less constrained illumination situations. In order to approach the problem, a new multi-session and multi-spectral face database has been acquired in visible, Near-infrared (NIR) and Thermal infrared (TIR) spectra, under different lighting conditions. A theoretical analysis using information theory to demonstrate the complementarities between different spectral bands have been firstly carried out. The optimal exploitation of the information provided by the set of multispectral images has been subsequently addressed by using multimodal matching score fusion techniques that efficiently synthesize complementary meaningful information among different spectra. Due to peculiarities in thermal images, a specific face segmentation algorithm has been required and developed. In the final proposed system, the Discrete Cosine Transform as dimensionality reduction tool and a fractional distance for matching were used, so that the cost in processing time and memory was significantly reduced. Prior to this classification task, a selection of the relevant frequency bands is proposed in order to optimize the overall system, based on identifying and maximizing independence relations by means of discriminability criteria. The system has been extensively evaluated on the multispectral face database specifically performed for our purpose. On this regard, a new visualization procedure has been suggested in order to combine different bands for establishing valid comparisons and giving statistical information about the significance of the results. This experimental framework has more easily enabled the improvement of robustness against training and testing illumination mismatch. Additionally, focusing problem in thermal spectrum has been also addressed, firstly, for the more general case of the thermal images (or thermograms), and then for the case of facialthermograms from both theoretical and practical point of view. In order to analyze the quality of such facial thermograms degraded by blurring, an appropriate algorithm has been successfully developed. Experimental results strongly support the proposed multispectral facial image fusion, achieving very high performance in several conditions. These results represent a new advance in providing a robust matching across changes in illumination, further inspiring highly accurate FR approaches in practical scenarios.El reconeixement facial (FR) ha estat àmpliament estudiat, degut tant als reptes fonamentals científics que suposa com a les aplicacions actuals i futures on requereix la identificació de les persones. Els sistemes de reconeixement facial tenen els avantatges de ser no intrusius,presentar un baix cost dels equips d’adquisició i no la no necessitat d’autorització per part de l’individu a l’hora de realitzar l'adquisició, entre les més importants. De totes maneres i malgrat els avenços aconseguits en els darrers anys i les diferents solucions proposades, el rendiment del FR encara no resulta satisfactori quan es requereixen condicions més exigents (diferents punts de vista, efectes de bloqueig, canvis en la il·luminació, condicions de llum extremes, etc.). Concretament, l'efecte d'aquestes variacions no controlades en les condicions d'il·luminació sobre les imatges facials condueix a una de les distorsions més accentuades sobre l'aparença facial. Aquesta tesi aborda el problema del FR en condicions d'il·luminació menys restringides. Per tal d'abordar el problema, hem adquirit una nova base de dades de cara multisessió i multiespectral en l'espectre infraroig visible, infraroig proper (NIR) i tèrmic (TIR), sota diferents condicions d'il·luminació. En primer lloc s'ha dut a terme una anàlisi teòrica utilitzant la teoria de la informació per demostrar la complementarietat entre les diferents bandes espectrals objecte d’estudi. L'òptim aprofitament de la informació proporcionada pel conjunt d'imatges multiespectrals s'ha abordat posteriorment mitjançant l'ús de tècniques de fusió de puntuació multimodals, capaces de sintetitzar de manera eficient el conjunt d’informació significativa complementària entre els diferents espectres. A causa de les característiques particulars de les imatges tèrmiques, s’ha requerit del desenvolupament d’un algorisme específic per la segmentació de les mateixes. En el sistema proposat final, s’ha utilitzat com a eina de reducció de la dimensionalitat de les imatges, la Transformada del Cosinus Discreta i una distància fraccional per realitzar les tasques de classificació de manera que el cost en temps de processament i de memòria es va reduir de forma significa. Prèviament a aquesta tasca de classificació, es proposa una selecció de les bandes de freqüències més rellevants, basat en la identificació i la maximització de les relacions d'independència per mitjà de criteris discriminabilitat, per tal d'optimitzar el conjunt del sistema. El sistema ha estat àmpliament avaluat sobre la base de dades de cara multiespectral, desenvolupada pel nostre propòsit. En aquest sentit s'ha suggerit l’ús d’un nou procediment de visualització per combinar diferents bandes per poder establir comparacions vàlides i donar informació estadística sobre el significat dels resultats. Aquest marc experimental ha permès més fàcilment la millora de la robustesa quan les condicions d’il·luminació eren diferents entre els processos d’entrament i test. De forma complementària, s’ha tractat la problemàtica de l’enfocament de les imatges en l'espectre tèrmic, en primer lloc, pel cas general de les imatges tèrmiques (o termogrames) i posteriorment pel cas concret dels termogrames facials, des dels punt de vista tant teòric com pràctic. En aquest sentit i per tal d'analitzar la qualitat d’aquests termogrames facials degradats per efectes de desenfocament, s'ha desenvolupat un últim algorisme. Els resultats experimentals recolzen fermament que la fusió d'imatges facials multiespectrals proposada assoleix un rendiment molt alt en diverses condicions d’il·luminació. Aquests resultats representen un nou avenç en l’aportació de solucions robustes quan es contemplen canvis en la il·luminació, i esperen poder inspirar a futures implementacions de sistemes de reconeixement facial precisos en escenaris no controlats.Postprint (published version

    Machine Learning

    Get PDF
    Machine Learning can be defined in various ways related to a scientific domain concerned with the design and development of theoretical and implementation tools that allow building systems with some Human Like intelligent behavior. Machine learning addresses more specifically the ability to improve automatically through experience

    3D Gaze Estimation from Remote RGB-D Sensors

    Get PDF
    The development of systems able to retrieve and characterise the state of humans is important for many applications and fields of study. In particular, as a display of attention and interest, gaze is a fundamental cue in understanding people activities, behaviors, intentions, state of mind and personality. Moreover, gaze plays a major role in the communication process, like for showing attention to the speaker, indicating who is addressed or averting gaze to keep the floor. Therefore, many applications within the fields of human-human, human-robot and human-computer interaction could benefit from gaze sensing. However, despite significant advances during more than three decades of research, current gaze estimation technologies can not address the conditions often required within these fields, such as remote sensing, unconstrained user movements and minimum user calibration. Furthermore, to reduce cost, it is preferable to rely on consumer sensors, but this usually leads to low resolution and low contrast images that current techniques can hardly cope with. In this thesis we investigate the problem of automatic gaze estimation under head pose variations, low resolution sensing and different levels of user calibration, including the uncalibrated case. We propose to build a non-intrusive gaze estimation system based on remote consumer RGB-D sensors. In this context, we propose algorithmic solutions which overcome many of the limitations of previous systems. We thus address the main aspects of this problem: 3D head pose tracking, 3D gaze estimation, and gaze based application modeling. First, we develop an accurate model-based 3D head pose tracking system which adapts to the participant without requiring explicit actions. Second, to achieve a head pose invariant gaze estimation, we propose a method to correct the eye image appearance variations due to head pose. We then investigate on two different methodologies to infer the 3D gaze direction. The first one builds upon machine learning regression techniques. In this context, we propose strategies to improve their generalization, in particular, to handle different people. The second methodology is a new paradigm we propose and call geometric generative gaze estimation. This novel approach combines the benefits of geometric eye modeling (normally restricted to high resolution images due to the difficulty of feature extraction) with a stochastic segmentation process (adapted to low-resolution) within a Bayesian model allowing the decoupling of user specific geometry and session specific appearance parameters, along with the introduction of priors, which are appropriate for adaptation relying on small amounts of data. The aforementioned gaze estimation methods are validated through extensive experiments in a comprehensive database which we collected and made publicly available. Finally, we study the problem of automatic gaze coding in natural dyadic and group human interactions. The system builds upon the thesis contributions to handle unconstrained head movements and the lack of user calibration. It further exploits the 3D tracking of participants and their gaze to conduct a 3D geometric analysis within a multi-camera setup. Experiments on real and natural interactions demonstrate the system is highly accuracy. Overall, the methods developed in this dissertation are suitable for many applications, involving large diversity in terms of setup configuration, user calibration and mobility

    E-INVIGILATION OF E-ASSESSMENTS

    Get PDF
    E-learning and particularly distance-based learning is becoming an increasingly important mechanism for education. A leading Virtual Learning Environment (VLE) reports a user base of 70 million students and 1.2 million teachers across 7.5 million courses. Whilst e-learning has introduced flexibility and remote/distance-based learning, there are still aspects of course delivery that rely upon traditional approaches. The most significant of these is examinations. The lack of being able to provide invigilation in a remote-mode has restricted the types of assessments, with exams or in-class test assessments proving difficult to validate. Students are still required to attend physical testing centres in order to ensure strict examination conditions are applied. Whilst research has begun to propose solutions in this respect, they fundamentally fail to provide the integrity required. This thesis seeks to research and develop an e-invigilator that will provide continuous and transparent invigilation of the individual undertaking an electronic based exam or test. The analysis of the e-invigilation solutions has shown that the suggested approaches to minimise cheating behaviours during the online test have varied. They have suffered from a wide range of weaknesses and lacked an implementation achieving continuous and transparent authentication with appropriate security restrictions. To this end, the most transparent biometric approaches are identified to be incorporated in an appropriate solution whilst maintaining security beyond the point-of-entry. Given the existing issues of intrusiveness and point-of-entry user authentication, a complete architecture has been developed based upon maintaining student convenience but providing effective identity verification throughout the test, rather than merely at the beginning. It also provides continuous system-level monitoring to prevent cheating, as well as a variety of management-level functionalities for creating and managing assessments including a prioritised and usable interface in order to enable the academics to quickly verify and check cases of possible cheating. The research includes a detailed discussion of the architecture requirements, components, and complete design to be the core of the system which captures, processes, and monitors students in a completely controlled e-test environment. In order to highlight the ease of use and lightweight nature of the system, a prototype was developed. Employing student face recognition as the most transparent multimodal (2D and 3D modes) biometrics, and novel security features through eye tracking, head movements, speech recognition, and multiple faces detection in order to enable a robust and flexible e-invigilation approach. Therefore, an experiment (Experiment 1) has been conducted utilising the developed prototype involving 51 participants. In this experiment, the focus has been mainly upon the usability of the system under normal use. The FRR of those 51 legitimate participants was 0 for every participant in the 2D mode; however, it was 0 for 45 of them and less than 0.096 for the rest 6 in the 3D mode. Consequently, for all the 51 participants of this experiment, on average, the FRR was 0 in 2D facial recognition mode, however, in 3D facial recognition mode, it was 0.048. Furthermore, in order to evaluate the robustness of the approach against targeted misuse 3 participants were tasked with a series of scenarios that map to typical misuse (Experiment 2). The FAR was 0.038 in the 2D mode and 0 in the 3D mode. The results of both experiments support the feasibility, security, and applicability of the suggested system. Finally, a series of scenario-based evaluations, involving the three separate stakeholders namely: Experts, Academics (qualitative-based surveys) and Students (a quantitative-based and qualitative-based survey) have also been utilised to provide a comprehensive evaluation into the effectiveness of the proposed approach. The vast majority of the interview/feedback outcomes can be considered as positive, constructive and valuable. The respondents agree with the idea of continuous and transparent authentication in e-assessments as it is vital for ensuring solid and convenient security beyond the point-of-entry. The outcomes have also supported the feasibility and practicality of the approach, as well as the efficiency of the system management via well-designed and smart interfaces.The Higher Committee for Education Development in Iraq (HCED

    Domain independent strategies in an affective tutoring system

    Get PDF
    There have been various attempts to develop an affective tutoring system (ATS) framework that considers and reacts to a student’s emotions while learning. However, there is a gap between current systems and the theory underlying human appraisal models. The current frameworks rely on a single appraisal and reaction phase. In contrast, the human appraisal process (Lazarus, 1991) involves two phases of appraisal and reaction (i.e. primary and secondary appraisal phases). This thesis proposes an affective tutoring (ATS) framework that introduces two phases of appraisal and reaction (i.e. primary and secondary appraisal and reaction phases). This proposed framework has been implemented and evaluated in a system to teach Data Structures. In addition, the system employs both domain-dependent and domain-independent strategies for coping with students’ affective states. This follows the emotion regulation model (Lazarus, 1991) that underpins the ATS framework which argues that individuals use both kinds of strategies in solving daily life problems. In comparison, current affective (ITS) frameworks concentrate on the use of domain-dependent strategies to cope with students’ affective states. The evaluation of the system provides some support for the idea that the ATS framework is useful both in improving students’ affective states (i.e. during and by the end of a learning session) and also their learning performance
    corecore