73 research outputs found

    VOICE BIOMETRICS UNDER MISMATCHED NOISE CONDITIONS

    Get PDF
    This thesis describes research into effective voice biometrics (speaker recognition) under mismatched noise conditions. Over the last two decades, this class of biometrics has been the subject of considerable research due to its various applications in such areas as telephone banking, remote access control and surveillance. One of the main challenges associated with the deployment of voice biometrics in practice is that of undesired variations in speech characteristics caused by environmental noise. Such variations can in turn lead to a mismatch between the corresponding test and reference material from the same speaker. This is found to adversely affect the performance of speaker recognition in terms of accuracy. To address the above problem, a novel approach is introduced and investigated. The proposed method is based on minimising the noise mismatch between reference speaker models and the given test utterance, and involves a new form of Test-Normalisation (T-Norm) for further enhancing matching scores under the aforementioned adverse operating conditions. Through experimental investigations, based on the two main classes of speaker recognition (i.e. verification/ open-set identification), it is shown that the proposed approach can significantly improve the performance accuracy under mismatched noise conditions. In order to further improve the recognition accuracy in severe mismatch conditions, an approach to enhancing the above stated method is proposed. This, which involves providing a closer adjustment of the reference speaker models to the noise condition in the test utterance, is shown to considerably increase the accuracy in extreme cases of noisy test data. Moreover, to tackle the computational burden associated with the use of the enhanced approach with open-set identification, an efficient algorithm for its realisation in this context is introduced and evaluated. The thesis presents a detailed description of the research undertaken, describes the experimental investigations and provides a thorough analysis of the outcomes

    Development of machine learning based speaker recognition system

    Get PDF
    In this thesis, we describe a biometric authentication system that is capable of recognizing its users??? voice using advanced machine learning and digital signal processing tools. The proposed system can both validate a person???s identity (i.e. verification) and recognize it from a larger known group of people (i.e. identification). We designed the entire speaker recognition system to be integrated into the Siebel Center???s infrastructure, and named it ???Biometric Authentication System for the Siebel Center (BASS)???. The main idea is to extract discriminative characteristics of an individual???s voiceprint, and employ them to train classifiers using binary classification. We formed the training data set by recording 11 speakers??? voices in a laboratory environment. The majority of the speakers were from different nations, with different language backgrounds and therefore various accents. They were considered to be a subset of the Siebel Center community. We asked them to speak 13 words including numeric digits (0-9) and proper nouns, and used triplet combinations of these words as passwords. We chose Mel-Frequency Cepstral Coefficients to represent the voice signals for forming frame-based feature vectors. With these we trained Support Vector Machine and Artificial Neural Network classifiers using ???One vs. all??? strategy. We tested our recognition models with unseen voice records from different speakers and found them very successful based on different criteria such as equal error rate, precision and recall values. In the scope of this work, we also assembled the hardware through which the software, including the algorithm and developed models, could operate. The hardware consists of several parts such as an infrared sensor that is used to sense the presence of users, a PIC microcontroller to communicate with the software and an LCD screen to display the passwords, etc. Based on the decision obtained from the software, BASS is also capable of opening the office door, where it is built to function

    Automatic Person Verification Using Speech and Face Information

    Get PDF
    Interest in biometric based identification and verification systems has increased considerably over the last decade. As an example, the shortcomings of security systems based on passwords can be addressed through the supplemental use of biometric systems based on speech signals, face images or fingerprints. Biometric recognition can also be applied to other areas, such as passport control (immigration checkpoints), forensic work (to determine whether a biometric sample belongs to a suspect) and law enforcement applications (e.g. surveillance). While biometric systems based on face images and/or speech signals can be useful, their performance can degrade in the presence of challenging conditions. In face based systems this can be in the form of a change in the illumination direction and/or face pose variations. Multi-modal systems use more than one biometric at the same time. This is done for two main reasons -- to achieve better robustness and to increase discrimination power. This thesis reviews relevant backgrounds in speech and face processing, as well as information fusion. It reports research aimed at increasing the robustness of single- and multi-modal biometric identity verification systems. In particular, it addresses the illumination and pose variation problems in face recognition, as well as the challenge of effectively fusing information from multiple modalities under non-ideal conditions

    Side-View Face Recognition

    Get PDF
    Side-view face recognition is a challenging problem with many applications. Especially in real-life scenarios where the environment is uncontrolled, coping with pose variations up to side-view positions is an important task for face recognition. In this paper we discuss the use of side view face recognition techniques to be used in house safety applications. Our aim is to recognize people as they pass through a door, and estimate their location in the house. Here, we compare available databases appropriate for this task, and review current methods for profile face recognition

    The selective use of gaze in automatic speech recognition

    Get PDF
    The performance of automatic speech recognition (ASR) degrades significantly in natural environments compared to in laboratory assessments. Being a major source of interference, acoustic noise affects speech intelligibility during the ASR process. There are two main problems caused by the acoustic noise. The first is the speech signal contamination. The second is the speakers' vocal and non-vocal behavioural changes. These phenomena elicit mismatch between the ASR training and recognition conditions, which leads to considerable performance degradation. To improve noise-robustness, exploiting prior knowledge of the acoustic noise in speech enhancement, feature extraction and recognition models are popular approaches. An alternative approach presented in this thesis is to introduce eye gaze as an extra modality. Eye gaze behaviours have roles in interaction and contain information about cognition and visual attention; not all behaviours are relevant to speech. Therefore, gaze behaviours are used selectively to improve ASR performance. This is achieved by inference procedures using noise-dependant models of gaze behaviours and their temporal and semantic relationship with speech. `Selective gaze-contingent ASR' systems are proposed and evaluated on a corpus of eye movement and related speech in different clean, noisy environments. The best performing systems utilise both acoustic and language model adaptation
    • ā€¦
    corecore