22 research outputs found

    Glottal-Source Spectral Biometry for Voice Characterization

    Get PDF
    The biometric signature derived from the estimation of the power spectral density singularities of a speaker’s glottal source is described in the present work. This consists in the collection of peak-trough profiles found in the spectral density, as related to the biomechanics of the vocal folds. Samples of parameter estimations from a set of 100 normophonic (pathology-free) speakers are produced. Mapping the set of speaker’s samples to a manifold defined by Principal Component Analysis and clustering them by k-means in terms of the most relevant principal components shows the separation of speakers by gender. This means that the proposed signature conveys relevant speaker’s metainformation, which may be useful in security and forensic applications for which contextual side information is considered relevant

    A Hybrid Parameterization Technique for Speaker Identification

    Get PDF
    Classical parameterization techniques for Speaker Identification use the codification of the power spectral density of raw speech, not discriminating between articulatory features produced by vocal tract dynamics (acoustic-phonetics) from glottal source biometry. Through the present paper a study is conducted to separate voicing fragments of speech into vocal and glottal components, dominated respectively by the vocal tract transfer function estimated adaptively to track the acoustic-phonetic sequence of the message, and by the glottal characteristics of the speaker and the phonation gesture. The separation methodology is based in Joint Process Estimation under the un-correlation hypothesis between vocal and glottal spectral distributions. Its application on voiced speech is presented in the time and frequency domains. The parameterization methodology is also described. Speaker Identification experiments conducted on 245 speakers are shown comparing different parameterization strategies. The results confirm the better performance of decoupled parameterization compared against approaches based on plain speech parameterization

    Glottal Biometric Features: Are Pathological Voice Studies appliable to Voice Biometry?

    Get PDF
    The purpose of the present paper is to introduce a methodology successfully used already in voice pathology detection for its possible adaptation to biometric speaker characterization as well. For such, the behavior of the same GMM classifiers used in the detection of pathology will be exploited. The work will show specific cases derived from running speech typically used in NIST contests against a Universal Background Model built from the population of normophonic subjects in specific vs general evaluation paradigms. Results are contrasted against a set of impostors derived from the same population of normophonic subjects. The relevance of the parameters used in the study will also be discusse

    Glottal Parameter Estimation by Wavelet Transform for Voice Biometry

    Get PDF
    Voice biometry is classically based on the parameterization and patterning of speech features mainly. The present approach is based on the characterization of phonation features instead (glottal features). The intention is to reduce intra-speaker variability due to the `text'. Through the study of larynx biomechanics it may be seen that the glottal correlates constitute a family of 2-nd order gaussian wavelets. The methodology relies in the extraction of glottal correlates (the glottal source) which are parameterized using wavelet techniques. Classification and pattern matching was carried out using Gaussian Mixture Models. Data of speakers from a balanced database and NIST SRE HASR2 were used in verification experiments. Preliminary results are given and discussed

    Decoupling Vocal Tract from Glottal Source Estimates in Speaker's Identification

    Get PDF
    Classical parameterization techniques in Speaker Identification tasks use the codification of the power spectral density of speech as a whole, not discriminating between articulatory features due to the dynamics of vocal tract (acoustic-phonetics) and those contributed by the glottal source. Through the present paper a study is conducted to separate voicing fragments of speech into vocal and glottal components, dominated respectively by the vocal tract transfer function estimated adaptively to track the acoustic-phonetic sequence of the message, and by the glottal characteristics of the speaker and the phonation gesture. In this way information which is conveyed in both components depending in different degree on message and biometry is estimated and treated differently to be fused at the time of template composition. The methodology to separate both components is based on the decorrelation hypothesis between vocal and glottal information and it is carried out using Joint Process Estimation. This methodology is briefly discussed and its application on vowel-like speech is presented as an example to observe the resulting estimates both in the time as in the frequency domain. The parameterization methodology to produce representative templates of the glottal and vocal components is also described. Speaker Identification experiments conducted on a wide database of 240 speakers is also given with comparative scorings obtained using different parameterization strategies. The results confirm the better performance of de-coupled parameterization techniques compared against approaches based on full speech parameterization

    Relevance of the glottal pulse and the vocal tract in gender detection

    Get PDF
    Gender detection is a very important objective to improve efficiency in tasks as speech or speaker recognition, among others. Traditionally gender detection has been focused on fundamental frequency (f0) and cepstral features derived from voiced segments of speech. The methodology presented here consists in obtaining uncorrelated glottal and vocal tract components which are parameterized as mel-frequency coefficients. K-fold and cross-validation using QDA and GMM classifiers showed that better detection rates are reached when glottal source and vocal tract parameters are used in a gender-balanced database of running speech from 340 speakers

    Improving speaker recognition by biometric voice deconstruction

    Get PDF
    Person identification, especially in critical environments, has always been a subject of great interest. However, it has gained a new dimension in a world threatened by a new kind of terrorism that uses social networks (e.g., YouTube) to broadcast its message. In this new scenario, classical identification methods (such as fingerprints or face recognition) have been forcedly replaced by alternative biometric characteristics such as voice, as sometimes this is the only feature available. The present study benefits from the advances achieved during last years in understanding and modeling voice production. The paper hypothesizes that a gender-dependent characterization of speakers combined with the use of a set of features derived from the components, resulting from the deconstruction of the voice into its glottal source and vocal tract estimates, will enhance recognition rates when compared to classical approaches. A general description about the main hypothesis and the methodology followed to extract the gender-dependent extended biometric parameters is given. Experimental validation is carried out both on a highly controlled acoustic condition database, and on a mobile phone network recorded under non-controlled acoustic conditions

    Using dysphonic voice to characterize speaker's biometry

    Get PDF
    Phonation distortion leaves relevant marks in a speaker's biometric profile. Dysphonic voice production may be used for biometrical speaker characterization. In the present paper phonation features derived from the glottal source (GS) parameterization, after vocal tract inversion, is proposed for dysphonic voice characterization in Speaker Verification tasks. The glottal source derived parameters are matched in a forensic evaluation framework defining a distance-based metric specification. The phonation segments used in the study are derived from fillers, long vowels, and other phonation segments produced in spontaneous telephone conversations. Phonated segments from a telephonic database of 100 male Spanish native speakers are combined in a 10-fold cross-validation task to produce the set of quality measurements outlined in the paper. Shimmer, mucosal wave correlate, vocal fold cover biomechanical parameter unbalance and a subset of the GS cepstral profile produce accuracy rates as high as 99.57 for a wide threshold interval (62.08-75.04%). An Equal Error Rate of 0.64 % can be granted. The proposed metric framework is shown to behave more fairly than classical likelihood ratios in supporting the hypothesis of the defense vs that of the prosecution, thus ofering a more reliable evaluation scoring. Possible applications are Speaker Verification and Dysphonic Voice Grading

    A methodology for monitoring emotional stress in phonation

    Get PDF
    Stress in phonation is mainly shown in the signature of the fundamental frequency. The proposed methodology is based on the estimation of the vocal fold biomechanics in terms of the distribution of the dynamic mass and the mechanical tension of the vocal fold structure. These parameters are derived from the reconstruction of the glottal source by inverse filtering. The vocal fold mechanical tension correlates (stress and strain), are used as the bases for tremor estimation. The correlates of tension and tremor are used to characterize the spontaneous speech of a database of 40 speakers of both genders (20 male and 20 female). Spontaneous speech consists in short interviews of 20 s of duration where the speakers have to express opinions on hot issues with which they are in agreement (pro) or in disagreement (con) following Arciuli's methodology. The emotional stress is estimated from the biomechanical correlates expressed above (tension and tremor). The null hypothesis formulated as the insensitivity of the speaker to pro and con situations has to be disregarded in view of the results for both genders. Interesting open questions are to be raised regarding the possibility of speakers consciously hiding their true opinion based on political correctness. The discussion will offer different hypotheses to further exploit the objective of detecting self-congruence in spoken messages
    corecore