482 research outputs found

    Fixed FAR Vote Fusion of regional Facial Classifiers

    Get PDF
    Holistic face recognition methods like PCA and LDA have the disadvantage that they are very sensitive to expression, hair and illumination variations. This is one of the main reasons they are no longer competitive in the major benchmarks like FRGC and FRVT. In this paper we present an LDA based approach that combines many overlapping regional classifiers (experts) using what we call a Fixed FAR Voting Fusion (FFVF) strategy. The combination by voting of regional classifiers means that if there are sufficient regional classifiers unaffected by the expression, illumination or hair variations, the fused classifier will still correctly recognise the face. The FFVF approach has two interesting properties: it allows robust fusion of dependent classifiers and it only requires a single parameter to be tuned to obtain weights for fusion of different classifiers. We show the potential of the FFVF of regional classifiers using the standard benchmarks experiments 1 and 4 on FRGCv2 data. The multiregion FFVF classifier has a FRR of 4% at FAR=0.1% for controlled and 38% for uncontrolled data compared to 7% and\ud 56% for the best single region classifier

    Fast and Accurate 3D Face Recognition Using Registration to an Intrinsic Coordinate System and Fusion of Multiple Region classifiers

    Get PDF
    In this paper we present a new robust approach for 3D face registration to an intrinsic coordinate system of the face. The intrinsic coordinate system is defined by the vertical symmetry plane through the nose, the tip of the nose and the slope of the bridge of the nose. In addition, we propose a 3D face classifier based on the fusion of many dependent region classifiers for overlapping face regions. The region classifiers use PCA-LDA for feature extraction and the likelihood ratio as a matching score. Fusion is realised using straightforward majority voting for the identification scenario. For verification, a voting approach is used as well and the decision is defined by comparing the number of votes to a threshold. Using the proposed registration method combined with a classifier consisting of 60 fused region classifiers we obtain a 99.0% identification rate on the all vs first identification test of the FRGC v2 data. A verification rate of 94.6% at FAR=0.1% was obtained for the all vs all verification test on the FRGC v2 data using fusion of 120 region classifiers. The first is the highest reported performance and the second is in the top-5 of best performing systems on these tests. In addition, our approach is much faster than other methods, taking only 2.5 seconds per image for registration and less than 0.1 ms per comparison. Because we apply feature extraction using PCA and LDA, the resulting template size is also very small: 6 kB for 60 region classifiers

    Setting a world record in 3D face recognition

    Get PDF
    Biometrics - recognition of persons based on how they look or behave, is the main subject of research at the Chair of Biometric Pattern Recognition (BPR) of the Services, Cyber Security and Safety Group (SCS) of the EEMCS Faculty at the University of Twente. Examples are finger print recognition, iris and face recognition. A relatively new field is 3D face recognition based on the shape of the face rather that its appearance. This paper presents a method for 3D face recognition developed at the Chair of Biometric Pattern Recognition (BPR) of the Services, Cyber Security and Safety Group (SCS) of the EEMCS Faculty at the University of Twente and published in 2011. The paper also shows that noteworthy performance gains can be obtained by optimisation of an existing method. The method is based on registration to an intrinsic coordinate system using the vertical symmetry plane of the head, the tip of the nose and the slope of the nose bridge. For feature extraction and classification multiple regional PCA-LDA-likelihood ratio based classifiers are fused using a fixed FAR voting strategy. We present solutions for correction of motion artifacts in 3D scans, improved registration and improved training of the used PCA-LDA classifier using automatic outlier removal. These result in a notable improvement of the recognition rates. The all vs all verification rate for the FRGC v2 dataset jumps to 99.3% and the identification rate for the all vs first to 99.4%. Both are to our knowledge the best results ever obtained for these benchmarks by a fairly large margin

    A weighted regional voting based ensemble of multiple classifiers for face recognition.

    Get PDF
    Face recognition is heavily studied for its wide range of application in areas such as information security, law enforcement, surveillance of the environment, entertainment, smart cards, etc. Competing techniques have been proposed in computer vision conferences and journals, no algorithm has emerged as superior in all cases over the last decade. In this work, we developed a framework which can embed all available algorithms and achieve better results in all cases over the algorithms that we have embedded, without great sacrifice in time complexity. We build on the success of a recently raised concept - Regional Voting. The new system adds weights to different regions of the human face. Different methods of cooperation among algorithms are also proposed. Extensive experiments, carried out on benchmark face databases, show the proposed system's joint contribution from multiple algorithms is faster and more accurate than Regional Voting in every case. --P. ix.The original print copy of this thesis may be available here: http://wizard.unbc.ca/record=b180553

    Improving the Generalizability of Speech Emotion Recognition: Methods for Handling Data and Label Variability

    Full text link
    Emotion is an essential component in our interaction with others. It transmits information that helps us interpret the content of what others say. Therefore, detecting emotion from speech is an important step towards enabling machine understanding of human behaviors and intentions. Researchers have demonstrated the potential of emotion recognition in areas such as interactive systems in smart homes and mobile devices, computer games, and computational medical assistants. However, emotion communication is variable: individuals may express emotion in a manner that is uniquely their own; different speech content and environments may shape how emotion is expressed and recorded; individuals may perceive emotional messages differently. Practically, this variability is reflected in both the audio-visual data and the labels used to create speech emotion recognition (SER) systems. SER systems must be robust and generalizable to handle the variability effectively. The focus of this dissertation is on the development of speech emotion recognition systems that handle variability in emotion communications. We break the dissertation into three parts, according to the type of variability we address: (I) in the data, (II) in the labels, and (III) in both the data and the labels. Part I: The first part of this dissertation focuses on handling variability present in data. We approximate variations in environmental properties and expression styles by corpus and gender of the speakers. We find that training on multiple corpora and controlling for the variability in gender and corpus using multi-task learning result in more generalizable models, compared to the traditional single-task models that do not take corpus and gender variability into account. Another source of variability present in the recordings used in SER is the phonetic modulation of acoustics. On the other hand, phonemes also provide information about the emotion expressed in speech content. We discover that we can make more accurate predictions of emotion by explicitly considering both roles of phonemes. Part II: The second part of this dissertation addresses variability present in emotion labels, including the differences between emotion expression and perception, and the variations in emotion perception. We discover that it is beneficial to jointly model both the perception of others and how one perceives oneā€™s own expression, compared to focusing on either one. Further, we show that the variability in emotion perception is a modelable signal and can be captured using probability distributions that describe how groups of evaluators perceive emotional messages. Part III: The last part of this dissertation presents methods that handle variability in both data and labels. We reduce the data variability due to non-emotional factors using deep metric learning and model the variability in emotion perception using soft labels. We propose a family of loss functions and show that by pairing examples that potentially vary in expression styles and lexical content and preserving the real-valued emotional similarity between them, we develop systems that generalize better across datasets and are more robust to over-training. These works demonstrate the importance of considering data and label variability in the creation of robust and generalizable emotion recognition systems. We conclude this dissertation with the following future directions: (1) the development of real-time SER systems; (2) the personalization of general SER systems.PHDComputer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/147639/1/didizbq_1.pd

    Eye center localization and gaze gesture recognition for human-computer interaction

    Get PDF
    Ā© 2016 Optical Society of America. This paper introduces an unsupervised modular approach for accurate and real-time eye center localization in images and videos, thus allowing a coarse-to-fine, global-to-regional scheme. The trajectories of eye centers in consecutive frames, i.e., gaze gestures, are further analyzed, recognized, and employed to boost the human-computer interaction (HCI) experience. This modular approach makes use of isophote and gradient features to estimate the eye center locations. A selective oriented gradient filter has been specifically designed to remove strong gradients from eyebrows, eye corners, and shadows, which sabotage most eye center localization methods. A real-world implementation utilizing these algorithms has been designed in the form of an interactive advertising billboard to demonstrate the effectiveness of our method for HCI. The eye center localization algorithm has been compared with 10 other algorithms on the BioID database and six other algorithms on the GI4E database. It outperforms all the other algorithms in comparison in terms of localization accuracy. Further tests on the extended Yale Face Database b and self-collected data have proved this algorithm to be robust against moderate head poses and poor illumination conditions. The interactive advertising billboard has manifested outstanding usability and effectiveness in our tests and shows great potential for benefiting a wide range of real-world HCI applications

    Use of Coherent Point Drift in computer vision applications

    Get PDF
    This thesis presents the novel use of Coherent Point Drift in improving the robustness of a number of computer vision applications. CPD approach includes two methods for registering two images - rigid and non-rigid point set approaches which are based on the transformation model used. The key characteristic of a rigid transformation is that the distance between points is preserved, which means it can be used in the presence of translation, rotation, and scaling. Non-rigid transformations - or affine transforms - provide the opportunity of registering under non-uniform scaling and skew. The idea is to move one point set coherently to align with the second point set. The CPD method finds both the non-rigid transformation and the correspondence distance between two point sets at the same time without having to use a-priori declaration of the transformation model used. The first part of this thesis is focused on speaker identification in video conferencing. A real-time, audio-coupled video based approach is presented, which focuses more on the video analysis side, rather than the audio analysis that is known to be prone to errors. CPD is effectively utilised for lip movement detection and a temporal face detection approach is used to minimise false positives if face detection algorithm fails to perform. The second part of the thesis is focused on multi-exposure and multi-focus image fusion with compensation for camera shake. Scale Invariant Feature Transforms (SIFT) are first used to detect keypoints in images being fused. Subsequently this point set is reduced to remove outliers, using RANSAC (RANdom Sample Consensus) and finally the point sets are registered using CPD with non-rigid transformations. The registered images are then fused with a Contourlet based image fusion algorithm that makes use of a novel alpha blending and filtering technique to minimise artefacts. The thesis evaluates the performance of the algorithm in comparison to a number of state-of-the-art approaches, including the key commercial products available in the market at present, showing significantly improved subjective quality in the fused images. The final part of the thesis presents a novel approach to Vehicle Make & Model Recognition in CCTV video footage. CPD is used to effectively remove skew of vehicles detected as CCTV cameras are not specifically configured for the VMMR task and may capture vehicles at different approaching angles. A LESH (Local Energy Shape Histogram) feature based approach is used for vehicle make and model recognition with the novelty that temporal processing is used to improve reliability. A number of further algorithms are used to maximise the reliability of the final outcome. Experimental results are provided to prove that the proposed system demonstrates an accuracy in excess of 95% when tested on real CCTV footage with no prior camera calibration
    • ā€¦
    corecore