477 research outputs found
Identifiability of multivariate logistic mixture models
Mixture models have been widely used in modeling of continuous observations.
For the possibility to estimate the parameters of a mixture model consistently
on the basis of observations from the mixture, identifiability is a necessary
condition. In this study, we give some results on the identifiability of
multivariate logistic mixture models
A Novel Windowing Technique for Efficient Computation of MFCC for Speaker Recognition
In this paper, we propose a novel family of windowing technique to compute
Mel Frequency Cepstral Coefficient (MFCC) for automatic speaker recognition
from speech. The proposed method is based on fundamental property of discrete
time Fourier transform (DTFT) related to differentiation in frequency domain.
Classical windowing scheme such as Hamming window is modified to obtain
derivatives of discrete time Fourier transform coefficients. It has been
mathematically shown that the slope and phase of power spectrum are inherently
incorporated in newly computed cepstrum. Speaker recognition systems based on
our proposed family of window functions are shown to attain substantial and
consistent performance improvement over baseline single tapered Hamming window
as well as recently proposed multitaper windowing technique
Lip segmentation using adaptive color space training
In audio-visual speech recognition (AVSR), it is beneficial
to use lip boundary information in addition to texture-dependent
features. In this paper, we propose an automatic lip segmentation
method that can be used in AVSR systems. The algorithm
consists of the following steps: face detection, lip corners extraction,
adaptive color space training for lip and non-lip regions
using Gaussian mixture models (GMMs), and curve evolution
using level-set formulation based on region and image
gradients fields. Region-based fields are obtained using adapted
GMM likelihoods. We have tested the proposed algorithm on a
database (SU-TAV) of 100 facial images and obtained objective
performance results by comparing automatic lip segmentations
with hand-marked ground truth segmentations. Experimental
results are promising and much work has to be done to improve
the robustness of the proposed method
Glottal Source Cepstrum Coefficients Applied to NIST SRE 2010
Through the present paper, a novel feature set for speaker recognition based on glottal estimate information is presented. An iterative algorithm is used to derive the vocal tract and glottal source estimations from speech signal. In order to test the importance of glottal source information in speaker characterization, the novel feature set has been tested in the 2010 NIST Speaker Recognition Evaluation (NIST SRE10). The proposed system uses glottal estimate parameter templates and classical cepstral information to build a model for each speaker involved in the recognition process. ALIZE [1] open-source software has been used to create the GMM models for both background and target speakers. Compared to using mel-frequency cepstrum coefficients (MFCC), the misclassification rate for the NIST SRE 2010 reduced from 29.43% to 27.15% when glottal source features are use
- âŠ