298 research outputs found

    Advanced Biometrics with Deep Learning

    Get PDF
    Biometrics, such as fingerprint, iris, face, hand print, hand vein, speech and gait recognition, etc., as a means of identity management have become commonplace nowadays for various applications. Biometric systems follow a typical pipeline, that is composed of separate preprocessing, feature extraction and classification. Deep learning as a data-driven representation learning approach has been shown to be a promising alternative to conventional data-agnostic and handcrafted pre-processing and feature extraction for biometric systems. Furthermore, deep learning offers an end-to-end learning paradigm to unify preprocessing, feature extraction, and recognition, based solely on biometric data. This Special Issue has collected 12 high-quality, state-of-the-art research papers that deal with challenging issues in advanced biometric systems based on deep learning. The 12 papers can be divided into 4 categories according to biometric modality; namely, face biometrics, medical electronic signals (EEG and ECG), voice print, and others

    The Effect Of Acoustic Variability On Automatic Speaker Recognition Systems

    Get PDF
    This thesis examines the influence of acoustic variability on automatic speaker recognition systems (ASRs) with three aims. i. To measure ASR performance under 5 commonly encountered acoustic conditions; ii. To contribute towards ASR system development with the provision of new research data; iii. To assess ASR suitability for forensic speaker comparison (FSC) application and investigative/pre-forensic use. The thesis begins with a literature review and explanation of relevant technical terms. Five categories of research experiments then examine ASR performance, reflective of conditions influencing speech quantity (inhibitors) and speech quality (contaminants), acknowledging quality often influences quantity. Experiments pertain to: net speech duration, signal to noise ratio (SNR), reverberation, frequency bandwidth and transcoding (codecs). The ASR system is placed under scrutiny with examination of settings and optimum conditions (e.g. matched/unmatched test audio and speaker models). Output is examined in relation to baseline performance and metrics assist in informing if ASRs should be applied to suboptimal audio recordings. Results indicate that modern ASRs are relatively resilient to low and moderate levels of the acoustic contaminants and inhibitors examined, whilst remaining sensitive to higher levels. The thesis provides discussion on issues such as the complexity and fragility of the speech signal path, speaker variability, difficulty in measuring conditions and mitigation (thresholds and settings). The application of ASRs to casework is discussed with recommendations, acknowledging the different modes of operation (e.g. investigative usage) and current UK limitations regarding presenting ASR output as evidence in criminal trials. In summary, and in the context of acoustic variability, the thesis recommends that ASRs could be applied to pre-forensic cases, accepting extraneous issues endure which require governance such as validation of method (ASR standardisation) and population data selection. However, ASRs remain unsuitable for broad forensic application with many acoustic conditions causing irrecoverable speech data loss contributing to high error rates

    Subband spectral features for speaker recognition.

    Get PDF
    Tam Yuk Yin.Thesis (M.Phil.)--Chinese University of Hong Kong, 2004.Includes bibliographical references.Abstracts in English and Chinese.Chapter Chapter 1 --- Introduction --- p.1Chapter 1.1. --- Biometrics for User Authentication --- p.2Chapter 1.2. --- Voice-based User Authentication --- p.6Chapter 1.3. --- Motivation and Focus of This Work --- p.7Chapter 1.4. --- Thesis Outline --- p.9References --- p.11Chapter Chapter 2 --- Fundamentals of Automatic Speaker Recognition --- p.14Chapter 2.1. --- Speech Production --- p.14Chapter 2.2. --- Features of Speaker's Voice in Speech Signal --- p.16Chapter 2.3. --- Basics of Speaker Recognition --- p.19Chapter 2.4. --- Existing Approaches of Speaker Recognition --- p.20Chapter 2.4.1. --- Feature Extraction --- p.21Chapter 2.4.1.1 --- Overview --- p.21Chapter 2.4.1.2 --- Mel-Frequency Cepstral Coefficient (MFCC) --- p.21Chapter 2.4.2. --- Speaker Modeling --- p.24Chapter 2.4.2.1 --- Overview --- p.24Chapter 2.4.2.2 --- Gaussian Mixture Model (GMM) --- p.25Chapter 2.4.3. --- Speaker Identification (SID) --- p.26References --- p.29Chapter Chapter 3 --- Data Collection and Baseline System --- p.32Chapter 3.1. --- Data Collection --- p.32Chapter 3.2. --- Baseline System --- p.36Chapter 3.2.1. --- Experimental Set-up --- p.36Chapter 3.2.2. --- Results and Analysis --- p.39References --- p.42Chapter Chapter 4 --- Subband Spectral Envelope Features --- p.44Chapter 4.1. --- Spectral Envelope Features --- p.44Chapter 4.2. --- Subband Spectral Envelope Features --- p.46Chapter 4.3. --- Feature Extraction Procedures --- p.52Chapter 4.4. --- SID Experiments --- p.55Chapter 4.4.1. --- Experimental Set-up --- p.55Chapter 4.4.2. --- Results and Analysis --- p.55References --- p.62Chapter Chapter 5 --- Fusion of Subband Features --- p.63Chapter 5.1. --- Model Level Fusion --- p.63Chapter 5.1.1. --- Experimental Set-up --- p.63Chapter 5.1.2. --- "Results and Analysis," --- p.65Chapter 5.2. --- Feature Level Fusion --- p.69Chapter 5.2.1. --- Experimental Set-up --- p.70Chapter 5.2.2. --- "Results and Analysis," --- p.71Chapter 5.3. --- Discussion --- p.73References --- p.75Chapter Chapter 6 --- Utterance-Level SID with Text-Dependent Weights --- p.77Chapter 6.1. --- Motivation --- p.77Chapter 6.2. --- Utterance-Level SID --- p.78Chapter 6.3. --- Baseline System --- p.79Chapter 6.3.1. --- Implementation Details --- p.79Chapter 6.3.2. --- "Results and Analysis," --- p.80Chapter 6.4. --- Text-Dependent Weights --- p.81Chapter 6.4.1. --- Implementation Details --- p.81Chapter 6.4.2. --- "Results and Analysis," --- p.83Chapter 6.5. --- Text-Dependent Feature Weights --- p.86Chapter 6.5.1. --- Implementation Details --- p.86Chapter 6.5.2. --- "Results and Analysis," --- p.87Chapter 6.6. --- Text-Dependent Weights Applied in Score Combination and Subband Features --- p.88Chapter 6.6.1. --- Implementation Details --- p.89Chapter 6.6.2. --- Results and Analysis --- p.89Chapter 6.7. --- Discussion --- p.90Chapter Chapter 7 --- Conclusions and Suggested Future Work --- p.92Chapter 7.1. --- Conclusions --- p.92Chapter 7.2. --- Suggested Future Work --- p.94Appendix --- p.96Appendix 1 Speech Content for Data Collection --- p.9

    Nasality in automatic speaker verification

    Get PDF

    Automatic speaker recognition: modelling, feature extraction and effects of clinical environment

    Get PDF
    Speaker recognition is the task of establishing identity of an individual based on his/her voice. It has a significant potential as a convenient biometric method for telephony applications and does not require sophisticated or dedicated hardware. The Speaker Recognition task is typically achieved by two-stage signal processing: training and testing. The training process calculates speaker-specific feature parameters from the speech. The features are used to generate statistical models of different speakers. In the testing phase, speech samples from unknown speakers are compared with the models and classified. Current state of the art speaker recognition systems use the Gaussian mixture model (GMM) technique in combination with the Expectation Maximization (EM) algorithm to build the speaker models. The most frequently used features are the Mel Frequency Cepstral Coefficients (MFCC). This thesis investigated areas of possible improvements in the field of speaker recognition. The identified drawbacks of the current speaker recognition systems included: slow convergence rates of the modelling techniques and feature’s sensitivity to changes due aging of speakers, use of alcohol and drugs, changing health conditions and mental state. The thesis proposed a new method of deriving the Gaussian mixture model (GMM) parameters called the EM-ITVQ algorithm. The EM-ITVQ showed a significant improvement of the equal error rates and higher convergence rates when compared to the classical GMM based on the expectation maximization (EM) method. It was demonstrated that features based on the nonlinear model of speech production (TEO based features) provided better performance compare to the conventional MFCCs features. For the first time the effect of clinical depression on the speaker verification rates was tested. It was demonstrated that the speaker verification results deteriorate if the speakers are clinically depressed. The deterioration process was demonstrated using conventional (MFCC) features. The thesis also showed that when replacing the MFCC features with features based on the nonlinear model of speech production (TEO based features), the detrimental effect of the clinical depression on speaker verification rates can be reduced

    Individual Differences in Speech Production and Perception

    Get PDF
    Inter-individual variation in speech is a topic of increasing interest both in human sciences and speech technology. It can yield important insights into biological, cognitive, communicative, and social aspects of language. Written by specialists in psycholinguistics, phonetics, speech development, speech perception and speech technology, this volume presents experimental and modeling studies that provide the reader with a deep understanding of interspeaker variability and its role in speech processing, speech development, and interspeaker interactions. It discusses how theoretical models take into account individual behavior, explains why interspeaker variability enriches speech communication, and summarizes the limitations of the use of speaker information in forensics
    • …
    corecore