Search CORE

3,987 research outputs found

Probabilistic Linear Discriminant Analysis for Acoustic Modeling

Author: Lu Liang
Renals Steve
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/06/2014
Field of study

Acoustic models using probabilistic linear discriminant analysis (PLDA) capture the correlations within feature vectors using subspaces which do not vastly expand the model. This allows high dimensional and correlated feature spaces to be used, without requiring the estimation of multiple high dimension covariance matrices. In this letter we extend the recently presented PLDA mixture model for speech recognition through a tied PLDA approach, which is better able to control the model size to avoid overfitting. We carried out experiments using the Switchboard corpus, with both mel frequency cepstral coefficient features and bottleneck feature derived from a deep neural network. Reductions in word error rate were obtained by using tied PLDA, compared with the PLDA mixture model, subspace Gaussian mixture models, and deep neural networks

arXiv.org e-Print Archive

Crossref

Edinburgh Research Explorer

Model of the Classification of English Vowels by Spanish Speakers

Author: Cohen Michael
Guenther Frank
Husaine Fatima
Negishi Michiro
Publication venue: Boston University Center for Adaptive Systems and Department of Cognitive and Neural Systems
Publication date: 01/10/1999
Field of study

A number of models of single language vowel classification based on formant representations have been proposed. We propose a new model that explicitly predicts vowel perception by second language (L2) learners based on the phonological map of their native language (Ll). The model represents the vowels using polar coordinates in the F l-F2 formant space. Boundaries bisect the angles made by two adjacent category centroids. An L2 vowel is classified with the closest Ll vowel with a probability based on the angular difference of the L2 vowel and the Ll vowel boundary. The polar coordinate model is compared with other vowel classification models, such as the quadratic discriminant analysis method used by Hillenbrand and Gay vert [J. Speech Hear. Research, 36, 694-700, 1993] and the logistic regression analysis method adopted by Nearey [J. Phonetics, 18, 347-373, 1990]. All models were trained on Spanish vowel data and tested on English vowels. The results were compared with behavioral data obtained by Flege [Q. J. Exp. Psych., 43 A(3), 701-731 (1991)] for Spanish monolingual speakers identifying English vowels. The polar coordinate model outperformed the other models in matching its predictions most closely with the behavioral data.National Institute on Deafness and other Communication Disorders (R29 02852); Alfred P. Sloan Foundatio

Boston University Institutional Repository (OpenBU)

Discriminative Tandem Features for HMM-based EEG Classification

Author: Ariff A. K.
King Simon
Salleh Sh-Hussain
Ting Chee-Ming
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2013
Field of study

Abstract—We investigate the use of discriminative feature extractors in tandem configuration with generative EEG classification system. Existing studies on dynamic EEG classification typically use hidden Markov models (HMMs) which lack discriminative capability. In this paper, a linear and a non-linear classifier are discriminatively trained to produce complementary input features to the conventional HMM system. Two sets of tandem features are derived from linear discriminant analysis (LDA) projection output and multilayer perceptron (MLP) class-posterior probability, before appended to the standard autoregressive (AR) features. Evaluation on a two-class motor-imagery classification task shows that both the proposed tandem features yield consistent gains over the AR baseline, resulting in significant relative improvement of 6.2% and 11.2 % for the LDA and MLP features respectively. We also explore portability of these features across different subjects. Index Terms- Artificial neural network-hidden Markov models, EEG classification, brain-computer-interface (BCI)

CiteSeerX

Crossref

Edinburgh Research Explorer

Universiti Teknologi Malaysia Institutional Repository

Exploiting i–vector posterior covariances for short–duration language recognition

Author: Cumani Sandro
Fer Radek
Plchot Oldrich
Publication venue: ISCA
Publication date: 01/01/2015
Field of study

Linear models in i-vector space have shown to be an effective solution not only for speaker identification, but also for language recogniton. The i-vector extraction process, however, is affected by several factors, such as noise level, the acoustic content of the utterance and the duration of the spoken segments. These factors influence both the i-vector estimate and its uncertainty, represented by the i-vector posterior covariance matrix. Modeling of i-vector uncertainty with Probabilistic Linear Discriminant Analysis has shown to be effective for short-duration speaker identification. This paper extends the approach to language recognition, analyzing the effects of i-vector covariances on a state-of-the-art Gaussian classifier, and proposes an effective solution for the reduction of the average detection cost (Cavg) for short segments

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

PORTO Publications Open Repository TOrino

Deep Speaker Feature Learning for Text-independent Speaker Verification

Author: Chen Yixiang
Li Lantian
Shi Ying
Tang Zhiyuan
Wang Dong
Publication venue
Publication date: 10/05/2017
Field of study

Recently deep neural networks (DNNs) have been used to learn speaker features. However, the quality of the learned features is not sufficiently good, so a complex back-end model, either neural or probabilistic, has to be used to address the residual uncertainty when applied to speaker verification, just as with raw features. This paper presents a convolutional time-delay deep neural network structure (CT-DNN) for speaker feature learning. Our experimental results on the Fisher database demonstrated that this CT-DNN can produce high-quality speaker features: even with a single feature (0.3 seconds including the context), the EER can be as low as 7.68%. This effectively confirmed that the speaker trait is largely a deterministic short-time property rather than a long-time distributional pattern, and therefore can be extracted from just dozens of frames.Comment: deep neural networks, speaker verification, speaker featur

arXiv.org e-Print Archive

Crossref