3,987 research outputs found
Probabilistic Linear Discriminant Analysis for Acoustic Modeling
Acoustic models using probabilistic linear discriminant analysis (PLDA)
capture the correlations within feature vectors using subspaces which do not
vastly expand the model. This allows high dimensional and correlated feature
spaces to be used, without requiring the estimation of multiple high dimension
covariance matrices. In this letter we extend the recently presented PLDA
mixture model for speech recognition through a tied PLDA approach, which is
better able to control the model size to avoid overfitting. We carried out
experiments using the Switchboard corpus, with both mel frequency cepstral
coefficient features and bottleneck feature derived from a deep neural network.
Reductions in word error rate were obtained by using tied PLDA, compared with
the PLDA mixture model, subspace Gaussian mixture models, and deep neural
networks
Model of the Classification of English Vowels by Spanish Speakers
A number of models of single language vowel classification based on formant representations have been proposed. We propose a new model that explicitly predicts vowel perception by second language (L2) learners based on the phonological map of their native language (Ll). The model represents the vowels using polar coordinates in the F l-F2 formant space. Boundaries bisect the angles made by two adjacent category centroids. An L2 vowel is classified with the closest Ll vowel with a probability based on the angular difference of the L2 vowel and the Ll vowel boundary. The polar coordinate model is compared with other vowel classification models, such as the quadratic discriminant analysis method used by Hillenbrand and Gay vert [J. Speech Hear. Research, 36, 694-700, 1993] and the logistic regression analysis method adopted by Nearey [J. Phonetics, 18, 347-373, 1990]. All models were trained on Spanish vowel data and tested on English vowels. The results were compared with behavioral data obtained by Flege [Q. J. Exp. Psych., 43 A(3), 701-731 (1991)] for Spanish monolingual speakers identifying English vowels. The polar coordinate model outperformed the other models in matching its predictions most closely with the behavioral data.National Institute on Deafness and other Communication Disorders (R29 02852); Alfred P. Sloan Foundatio
Discriminative Tandem Features for HMM-based EEG Classification
AbstractâWe investigate the use of discriminative feature extractors in tandem configuration with generative EEG classification system. Existing studies on dynamic EEG classification typically use hidden Markov models (HMMs) which lack discriminative capability. In this paper, a linear and a non-linear classifier are discriminatively trained to produce complementary input features to the conventional HMM system. Two sets of tandem features are derived from linear discriminant analysis (LDA) projection output and multilayer perceptron (MLP) class-posterior probability, before appended to the standard autoregressive (AR) features. Evaluation on a two-class motor-imagery classification task shows that both the proposed tandem features yield consistent gains over the AR baseline, resulting in significant relative improvement of 6.2% and 11.2 % for the LDA and MLP features respectively. We also explore portability of these features across different subjects. Index Terms- Artificial neural network-hidden Markov models, EEG classification, brain-computer-interface (BCI)
Exploiting iâvector posterior covariances for shortâduration language recognition
Linear models in i-vector space have shown to be an effective solution not only for speaker identification, but also for language recogniton. The i-vector extraction process, however, is affected by several factors, such as noise level, the acoustic content of the utterance and the duration of the spoken segments. These factors influence both the i-vector estimate and its uncertainty, represented by the i-vector posterior covariance matrix. Modeling of i-vector uncertainty with Probabilistic Linear Discriminant Analysis has shown to be effective for short-duration speaker identification. This paper extends the approach to language recognition, analyzing the effects of i-vector covariances on a state-of-the-art Gaussian classifier, and proposes an effective solution for the reduction of the average detection cost (Cavg) for short segments
Deep Speaker Feature Learning for Text-independent Speaker Verification
Recently deep neural networks (DNNs) have been used to learn speaker
features. However, the quality of the learned features is not sufficiently
good, so a complex back-end model, either neural or probabilistic, has to be
used to address the residual uncertainty when applied to speaker verification,
just as with raw features. This paper presents a convolutional time-delay deep
neural network structure (CT-DNN) for speaker feature learning. Our
experimental results on the Fisher database demonstrated that this CT-DNN can
produce high-quality speaker features: even with a single feature (0.3 seconds
including the context), the EER can be as low as 7.68%. This effectively
confirmed that the speaker trait is largely a deterministic short-time property
rather than a long-time distributional pattern, and therefore can be extracted
from just dozens of frames.Comment: deep neural networks, speaker verification, speaker featur
- âŠ