477 research outputs found

    Identifiability of multivariate logistic mixture models

    Full text link
    Mixture models have been widely used in modeling of continuous observations. For the possibility to estimate the parameters of a mixture model consistently on the basis of observations from the mixture, identifiability is a necessary condition. In this study, we give some results on the identifiability of multivariate logistic mixture models

    A Novel Windowing Technique for Efficient Computation of MFCC for Speaker Recognition

    Full text link
    In this paper, we propose a novel family of windowing technique to compute Mel Frequency Cepstral Coefficient (MFCC) for automatic speaker recognition from speech. The proposed method is based on fundamental property of discrete time Fourier transform (DTFT) related to differentiation in frequency domain. Classical windowing scheme such as Hamming window is modified to obtain derivatives of discrete time Fourier transform coefficients. It has been mathematically shown that the slope and phase of power spectrum are inherently incorporated in newly computed cepstrum. Speaker recognition systems based on our proposed family of window functions are shown to attain substantial and consistent performance improvement over baseline single tapered Hamming window as well as recently proposed multitaper windowing technique

    Lip segmentation using adaptive color space training

    Get PDF
    In audio-visual speech recognition (AVSR), it is beneficial to use lip boundary information in addition to texture-dependent features. In this paper, we propose an automatic lip segmentation method that can be used in AVSR systems. The algorithm consists of the following steps: face detection, lip corners extraction, adaptive color space training for lip and non-lip regions using Gaussian mixture models (GMMs), and curve evolution using level-set formulation based on region and image gradients fields. Region-based fields are obtained using adapted GMM likelihoods. We have tested the proposed algorithm on a database (SU-TAV) of 100 facial images and obtained objective performance results by comparing automatic lip segmentations with hand-marked ground truth segmentations. Experimental results are promising and much work has to be done to improve the robustness of the proposed method

    Glottal Source Cepstrum Coefficients Applied to NIST SRE 2010

    Get PDF
    Through the present paper, a novel feature set for speaker recognition based on glottal estimate information is presented. An iterative algorithm is used to derive the vocal tract and glottal source estimations from speech signal. In order to test the importance of glottal source information in speaker characterization, the novel feature set has been tested in the 2010 NIST Speaker Recognition Evaluation (NIST SRE10). The proposed system uses glottal estimate parameter templates and classical cepstral information to build a model for each speaker involved in the recognition process. ALIZE [1] open-source software has been used to create the GMM models for both background and target speakers. Compared to using mel-frequency cepstrum coefficients (MFCC), the misclassification rate for the NIST SRE 2010 reduced from 29.43% to 27.15% when glottal source features are use
    • 

    corecore