27 research outputs found

    A Generative Model for Score Normalization in Speaker Recognition

    Full text link
    We propose a theoretical framework for thinking about score normalization, which confirms that normalization is not needed under (admittedly fragile) ideal conditions. If, however, these conditions are not met, e.g. under data-set shift between training and runtime, our theory reveals dependencies between scores that could be exploited by strategies such as score normalization. Indeed, it has been demonstrated over and over experimentally, that various ad-hoc score normalization recipes do work. We present a first attempt at using probability theory to design a generative score-space normalization model which gives similar improvements to ZT-norm on the text-dependent RSR 2015 database

    Glottal Source Cepstrum Coefficients Applied to NIST SRE 2010

    Get PDF
    Through the present paper, a novel feature set for speaker recognition based on glottal estimate information is presented. An iterative algorithm is used to derive the vocal tract and glottal source estimations from speech signal. In order to test the importance of glottal source information in speaker characterization, the novel feature set has been tested in the 2010 NIST Speaker Recognition Evaluation (NIST SRE10). The proposed system uses glottal estimate parameter templates and classical cepstral information to build a model for each speaker involved in the recognition process. ALIZE [1] open-source software has been used to create the GMM models for both background and target speakers. Compared to using mel-frequency cepstrum coefficients (MFCC), the misclassification rate for the NIST SRE 2010 reduced from 29.43% to 27.15% when glottal source features are use

    Improving speaker recognition by biometric voice deconstruction

    Get PDF
    Person identification, especially in critical environments, has always been a subject of great interest. However, it has gained a new dimension in a world threatened by a new kind of terrorism that uses social networks (e.g., YouTube) to broadcast its message. In this new scenario, classical identification methods (such as fingerprints or face recognition) have been forcedly replaced by alternative biometric characteristics such as voice, as sometimes this is the only feature available. The present study benefits from the advances achieved during last years in understanding and modeling voice production. The paper hypothesizes that a gender-dependent characterization of speakers combined with the use of a set of features derived from the components, resulting from the deconstruction of the voice into its glottal source and vocal tract estimates, will enhance recognition rates when compared to classical approaches. A general description about the main hypothesis and the methodology followed to extract the gender-dependent extended biometric parameters is given. Experimental validation is carried out both on a highly controlled acoustic condition database, and on a mobile phone network recorded under non-controlled acoustic conditions
    corecore