461 research outputs found

    Benchmarking Quality-Dependent and Cost-Sensitive Score-Level Multimodal Biometric Fusion Algorithms

    Get PDF
    Automatically verifying the identity of a person by means of biometrics is an important application in day-to-day activities such as accessing banking services and security control in airports. To increase the system reliability, several biometric devices are often used. Such a combined system is known as a multimodal biometric system. This paper reports a benchmarking study carried out within the framework of the BioSecure DS2 (Access Control) evaluation campaign organized by the University of Surrey, involving face, fingerprint, and iris biometrics for person authentication, targeting the application of physical access control in a medium-size establishment with some 500 persons. While multimodal biometrics is a well-investigated subject, there exists no benchmark for a fusion algorithm comparison. Working towards this goal, we designed two sets of experiments: quality-dependent and cost-sensitive evaluation. The quality-dependent evaluation aims at assessing how well fusion algorithms can perform under changing quality of raw images principally due to change of devices. The cost-sensitive evaluation, on the other hand, investigates how well a fusion algorithm can perform given restricted computation and in the presence of software and hardware failures, resulting in errors such as failure-to-acquire and failure-to-match. Since multiple capturing devices are available, a fusion algorithm should be able to handle this nonideal but nevertheless realistic scenario. In both evaluations, each fusion algorithm is provided with scores from each biometric comparison subsystem as well as the quality measures of both template and query data. The response to the call of the campaign proved very encouraging, with the submission of 22 fusion systems. To the best of our knowledge, this is the first attempt to benchmark quality-based multimodal fusion algorithms

    Multi-system Biometric Authentication: Optimal Fusion and User-Specific Information

    Get PDF
    Verifying a person's identity claim by combining multiple biometric systems (fusion) is a promising solution to identity theft and automatic access control. This thesis contributes to the state-of-the-art of multimodal biometric fusion by improving the understanding of fusion and by enhancing fusion performance using information specific to a user. One problem to deal with at the score level fusion is to combine system outputs of different types. Two statistically sound representations of scores are probability and log-likelihood ratio (LLR). While they are equivalent in theory, LLR is much more useful in practice because its distribution can be approximated by a Gaussian distribution, which makes it useful to analyze the problem of fusion. Furthermore, its score statistics (mean and covariance) conditioned on the claimed user identity can be better exploited. Our first contribution is to estimate the fusion performance given the class-conditional score statistics and given a particular fusion operator/classifier. Thanks to the score statistics, we can predict fusion performance with reasonable accuracy, identify conditions which favor a particular fusion operator, study the joint phenomenon of combining system outputs with different degrees of strength and correlation and possibly correct the adverse effect of bias (due to the score-level mismatch between training and test sets) on fusion. While in practice the class-conditional Gaussian assumption is not always true, the estimated performance is found to be acceptable. Our second contribution is to exploit the user-specific prior knowledge by limiting the class-conditional Gaussian assumption to each user. We exploit this hypothesis in two strategies. In the first strategy, we combine a user-specific fusion classifier with a user-independent fusion classifier by means of two LLR scores, which are then weighted to obtain a single output. We show that combining both user-specific and user-independent LLR outputs always results in improved performance than using the better of the two. In the second strategy, we propose a statistic called the user-specific F-ratio, which measures the discriminative power of a given user based on the Gaussian assumption. Although similar class separability measures exist, e.g., the Fisher-ratio for a two-class problem and the d-prime statistic, F-ratio is more suitable because it is related to Equal Error Rate in a closed form. F-ratio is used in the following applications: a user-specific score normalization procedure, a user-specific criterion to rank users and a user-specific fusion operator that selectively considers a subset of systems for fusion. The resultant fusion operator leads to a statistically significantly increased performance with respect to the state-of-the-art fusion approaches. Even though the applications are different, the proposed methods share the following common advantages. Firstly, they are robust to deviation from the Gaussian assumption. Secondly, they are robust to few training data samples thanks to Bayesian adaptation. Finally, they consider both the client and impostor information simultaneously

    Bi-Modal Face and Speech Authentication: a BioLogin Demonstration System

    Get PDF
    This paper presents a bi-modal (face and speech) authentication demonstration system that simulates the login of a user using its face and its voice. This demonstration is called BioLogin. It runs both on Linux and Windows and the Windows version is freely available for download. Bio\-Login is implemented using an open source machine learning library and its machine vision package

    Classification with class-independent quality information for biometric verification

    Get PDF
    Biometric identity verification systems frequently face the challenges of non-controlled conditions of data acquisition. Under such conditions biometric signals may suffer from quality degradation due to extraneous, identity-independent factors. It has been demonstrated in numerous reports that a degradation of biometric signal quality is a frequent cause of significant deterioration of classification performance, also in multiple-classifier, multimodal systems, which systematically outperform their single-classifier counterparts. Seeking to improve the robustness of classifiers to degraded data quality, researchers started to introduce measures of signal quality into the classification process. In the existing approaches, the role of class-independent quality information is governed by intuitive rather than mathematical notions, resulting in a clearly drawn distinction between the single-, multiple-classifier and multimodal approaches. The application of quality measures in a multiple-classifier system has received far more attention, with a dominant intuitive notion that a classifier that has data of higher quality at its disposal ought to be more credible than a classifier that operates on noisy signals. In the case of single-classifier systems a quality-based selection of models, classifiers or thresholds has been proposed. In both cases, quality measures have the function of meta-information which supervises but not intervenes with the actual classifier or classifiers employed to assign class labels to modality-specific and class-selective features. In this thesis we argue that in fact the very same mechanism governs the use of quality measures in single- and multi-classifier systems alike, and we present a quantitative rather than intuitive perspective on the role of quality measures in classification. We notice the fact that for a given set of classification features and their fixed marginal distributions, the class separation in the joint feature space changes with the statistical dependencies observed between the individual features. The same effect applies to a feature space in which some of the features are class-independent. Consequently, we demonstrate that the class separation can be improved by augmenting the feature space with class-independent quality information, provided that it sports statistical dependencies on the class-selective features. We discuss how to construct classifier-quality measure ensembles in which the dependence between classification scores and the quality features helps decrease classification errors below those obtained using the classification scores alone. We propose Q – stack, a novel theoretical framework of improving classification with class-independent quality measures based on the concept of classifier stacking. In the scheme of Q – stack a classifier ensemble is used in which the first classifier layer is made of the baseline unimodal classifiers, and the second, stacked classifier operates on features composed of the normalized similarity scores and the relevant quality measures. We present Q – stack as a generalized framework of classification with quality information and we argue that previously proposed methods of classification with quality measures are its special cases. Further in this thesis we address the problem of estimating probability of single classification errors. We propose to employ the subjective Bayesian interpretation of single event probability as credence in the correctness of single classification decisions. We propose to apply the credence-based error predictor as a functional extension of the proposed Q – stack framework, where a Bayesian stacked classifier is employed. As such, the proposed method of credence estimation and error prediction inherits the benefit of seamless incorporation of quality information in the process of credence estimation. We propose a set of objective evaluation criteria for credence estimates, and we discuss how the proposed method can be applied together with an appropriate repair strategy to reduce classification errors to a desired target level. Finally, we demonstrate the application of Q – stack and its functional extension to single error prediction on the task of biometric identity verification using face and fingerprint modalities, and their multimodal combinations, using a real biometric database. We show that the use of the classification and error prediction methods proposed in this thesis allows for a systematic reduction of the error rates below those of the baseline classifiers

    Multimodal Sensing and Data Processing for Speaker and Emotion Recognition using Deep Learning Models with Audio, Video and Biomedical Sensors

    Full text link
    The focus of the thesis is on Deep Learning methods and their applications on multimodal data, with a potential to explore the associations between modalities and replace missing and corrupt ones if necessary. We have chosen two important real-world applications that need to deal with multimodal data: 1) Speaker recognition and identification; 2) Facial expression recognition and emotion detection. The first part of our work assesses the effectiveness of speech-related sensory data modalities and their combinations in speaker recognition using deep learning models. First, the role of electromyography (EMG) is highlighted as a unique biometric sensor in improving audio-visual speaker recognition or as a substitute in noisy or poorly-lit environments. Secondly, the effectiveness of deep learning is empirically confirmed through its higher robustness to all types of features in comparison to a number of commonly used baseline classifiers. Not only do deep models outperform the baseline methods, their power increases when they integrate multiple modalities, as different modalities contain information on different aspects of the data, especially between EMG and audio. Interestingly, our deep learning approach is word-independent. Plus, the EMG, audio, and visual parts of the samples from each speaker do not need to match. This increases the flexibility of our method in using multimodal data, particularly if one or more modalities are missing. With a dataset of 23 individuals speaking 22 words five times, we show that EMG can replace the audio/visual modalities, and when combined, significantly improve the accuracy of speaker recognition. The second part describes a study on automated emotion recognition using four different modalities – audio, video, electromyography (EMG), and electroencephalography (EEG). We collected a dataset by recording the 4 modalities as 12 human subjects expressed six different emotions or maintained a neutral expression. Three different aspects of emotion recognition were investigated: model selection, feature selection, and data selection. Both generative models (DBNs) and discriminative models (LSTMs) were applied to the four modalities, and from these analyses we conclude that LSTM is better for audio and video together with their corresponding sophisticated feature extractors (MFCC and CNN), whereas DBN is better for both EMG and EEG. By examining these signals at different stages (pre-speech, during-speech, and post-speech) of the current and following trials, we have found that the most effective stages for emotion recognition from EEG occur after the emotion has been expressed, suggesting that the neural signals conveying an emotion are long-lasting
    • 

    corecore