1,547 research outputs found

    Likelihood ratio adjustment for the compensation of model mismatch in speaker verification

    Get PDF
    Cet article présente une méthode d'ajustement des seuils de vérification du locuteur basée sur un modÚle Gaussien des distributions du logarithme du rapport de vraisemblance. L'article expose les hypothÚses sous lesquelles ce modÚle est valide, indique plusieurs méthodes d'ajustement des seuils, et en illustre les apports et les limites par des expériences de vérification sur une base de données de 20 locuteurs

    VOICE BIOMETRICS UNDER MISMATCHED NOISE CONDITIONS

    Get PDF
    This thesis describes research into effective voice biometrics (speaker recognition) under mismatched noise conditions. Over the last two decades, this class of biometrics has been the subject of considerable research due to its various applications in such areas as telephone banking, remote access control and surveillance. One of the main challenges associated with the deployment of voice biometrics in practice is that of undesired variations in speech characteristics caused by environmental noise. Such variations can in turn lead to a mismatch between the corresponding test and reference material from the same speaker. This is found to adversely affect the performance of speaker recognition in terms of accuracy. To address the above problem, a novel approach is introduced and investigated. The proposed method is based on minimising the noise mismatch between reference speaker models and the given test utterance, and involves a new form of Test-Normalisation (T-Norm) for further enhancing matching scores under the aforementioned adverse operating conditions. Through experimental investigations, based on the two main classes of speaker recognition (i.e. verification/ open-set identification), it is shown that the proposed approach can significantly improve the performance accuracy under mismatched noise conditions. In order to further improve the recognition accuracy in severe mismatch conditions, an approach to enhancing the above stated method is proposed. This, which involves providing a closer adjustment of the reference speaker models to the noise condition in the test utterance, is shown to considerably increase the accuracy in extreme cases of noisy test data. Moreover, to tackle the computational burden associated with the use of the enhanced approach with open-set identification, an efficient algorithm for its realisation in this context is introduced and evaluated. The thesis presents a detailed description of the research undertaken, describes the experimental investigations and provides a thorough analysis of the outcomes

    The Effect Of Acoustic Variability On Automatic Speaker Recognition Systems

    Get PDF
    This thesis examines the influence of acoustic variability on automatic speaker recognition systems (ASRs) with three aims. i. To measure ASR performance under 5 commonly encountered acoustic conditions; ii. To contribute towards ASR system development with the provision of new research data; iii. To assess ASR suitability for forensic speaker comparison (FSC) application and investigative/pre-forensic use. The thesis begins with a literature review and explanation of relevant technical terms. Five categories of research experiments then examine ASR performance, reflective of conditions influencing speech quantity (inhibitors) and speech quality (contaminants), acknowledging quality often influences quantity. Experiments pertain to: net speech duration, signal to noise ratio (SNR), reverberation, frequency bandwidth and transcoding (codecs). The ASR system is placed under scrutiny with examination of settings and optimum conditions (e.g. matched/unmatched test audio and speaker models). Output is examined in relation to baseline performance and metrics assist in informing if ASRs should be applied to suboptimal audio recordings. Results indicate that modern ASRs are relatively resilient to low and moderate levels of the acoustic contaminants and inhibitors examined, whilst remaining sensitive to higher levels. The thesis provides discussion on issues such as the complexity and fragility of the speech signal path, speaker variability, difficulty in measuring conditions and mitigation (thresholds and settings). The application of ASRs to casework is discussed with recommendations, acknowledging the different modes of operation (e.g. investigative usage) and current UK limitations regarding presenting ASR output as evidence in criminal trials. In summary, and in the context of acoustic variability, the thesis recommends that ASRs could be applied to pre-forensic cases, accepting extraneous issues endure which require governance such as validation of method (ASR standardisation) and population data selection. However, ASRs remain unsuitable for broad forensic application with many acoustic conditions causing irrecoverable speech data loss contributing to high error rates

    Methods for speaking style conversion from normal speech to high vocal effort speech

    Get PDF
    This thesis deals with vocal-effort-focused speaking style conversion (SSC). Specifically, we studied two topics on conversion of normal speech to high vocal effort. The first topic involves the conversion of normal speech to shouted speech. We employed this conversion in a speaker recognition system with vocal effort mismatch between test and enrollment utterances (shouted speech vs. normal speech). The mismatch causes a degradation of the system's speaker identification performance. As solution, we proposed a SSC system that included a novel spectral mapping, used along a statistical mapping technique, to transform the mel-frequency spectral energies of normal speech enrollment utterances towards their counterparts in shouted speech. We evaluated the proposed solution by comparing speaker identification rates for a state-of-the-art i-vector-based speaker recognition system, with and without applying SSC to the enrollment utterances. Our results showed that applying the proposed SSC pre-processing to the enrollment data improves considerably the speaker identification rates. The second topic involves a normal-to-Lombard speech conversion. We proposed a vocoder-based parametric SSC system to perform the conversion. This system first extracts speech features using the vocoder. Next, a mapping technique, robust to data scarcity, maps the features. Finally, the vocoder synthesizes the mapped features into speech. We used two vocoders in the conversion system, for comparison: a glottal vocoder and the widely used STRAIGHT. We assessed the converted speech from the two vocoder cases with two subjective listening tests that measured similarity to Lombard speech and naturalness. The similarity subjective test showed that, for both vocoder cases, our proposed SSC system was able to convert normal speech to Lombard speech. The naturalness subjective test showed that the converted samples using the glottal vocoder were clearly more natural than those obtained with STRAIGHT

    Session varaibility compensation in automatic speaker and language recognition

    Full text link
    Tesis doctoral inédita. Universidad Autónoma de Madrid, Escuela Politécnica Superior, octubre de 201

    Speech Recognition

    Get PDF
    Chapters in the first part of the book cover all the essential speech processing techniques for building robust, automatic speech recognition systems: the representation for speech signals and the methods for speech-features extraction, acoustic and language modeling, efficient algorithms for searching the hypothesis space, and multimodal approaches to speech recognition. The last part of the book is devoted to other speech processing applications that can use the information from automatic speech recognition for speaker identification and tracking, for prosody modeling in emotion-detection systems and in other speech processing applications that are able to operate in real-world environments, like mobile communication services and smart homes

    On the Use of Speech and Face Information for Identity Verification

    Get PDF
    {T}his report first provides a review of important concepts in the field of information fusion, followed by a review of important milestones in audio-visual person identification and verification. {S}everal recent adaptive and non-adaptive techniques for reaching the verification decision (i.e., to accept or reject the claimant), based on speech and face information, are then evaluated in clean and noisy audio conditions on a common database; it is shown that in clean conditions most of the non-adaptive approaches provide similar performance and in noisy conditions most exhibit a severe deterioration in performance; it is also shown that current adaptive approaches are either inadequate or utilize restrictive assumptions. A new category of classifiers is then introduced, where the decision boundary is fixed but constructed to take into account how the distributions of opinions are likely to change due to noisy conditions; compared to a previously proposed adaptive approach, the proposed classifiers do not make a direct assumption about the type of noise that causes the mismatch between training and testing conditions. {T}his report is an extended and revised version of {IDIAP-RR} 02-33

    Empirical test of the performance of an acoustic-phonetic approach to forensic voice comparison under conditions similar to those of a real case

    Get PDF
    In a 2012 case in New South Wales, Australia, the identity of a speaker on several audio recordings was in question. Forensic voice comparison testimony was presented based on an auditory-acoustic-phonetic-spectrographic analysis. No empirical demonstration of the validity and reliability of the analytical methodology was presented. Unlike the admissibility standards in some other jurisdictions (e.g., US Federal Rule of Evidence 702 and the Daubert criteria, or England & Wales Criminal Practice Directions 19A), Australia's Unified Evidence Acts do not require demonstration of the validity and reliability of analytical methods and their implementation before testimony based upon them is presented in court. The present paper reports on empirical tests of the performance of an acoustic-phonetic-statistical forensic voice comparison system which exploited the same features as were the focus of the auditory-acoustic-phonetic-spectrographic analysis in the case, i.e., second-formant (F2) trajectories in /o/ tokens and mean fundamental frequency (f0). The tests were conducted under conditions similar to those in the case. The performance of the acoustic-phonetic-statistical system was very poor compared to that of an automatic system. © 2017 Elsevier B.V

    Adaptation of reference patterns in word-based speech recognition

    Get PDF
    • 

    corecore