Search CORE

13 research outputs found

Phonotactic language recognition using i-vectors and phoneme posteriogram counts

Author: Córdoba Herralde Ricardo de
D'haro Enríquez Luis Fernando
Glembek Ondřej
Matějka Pavel
Plchot Oldřich
Souﬁfar Mehdi
Černocký Jan
Publication venue: E.T.S.I. Telecomunicación (UPM)
Publication date: 01/01/2012
Field of study

This paper describes a novel approach to phonotactic LID, where instead of using soft-counts based on phoneme lattices, we use posteriogram to obtain n-gram counts. The high-dimensional vectors of counts are reduced to low-dimensional units for which we adapted the commonly used term i-vectors. The reduction is based on multinomial subspace modeling and is designed to work in the total-variability space. The proposed technique was tested on the NIST 2009 LRE set with better results to a system based on using soft-counts (Cavg on 30s: 3.15% vs 3.43%), and with very good results when fused with an acoustic i-vector LID system (Cavg on 30s acoustic 2.4% vs 1.25%). The proposed technique is also compared with another low dimensional projection system based on PCA. In comparison with the original soft-counts, the proposed technique provides better results, reduces the problems due to sparse counts, and avoids the process of using pruning techniques when creating the lattices

Archivo Digital UPM

Analysis of BUT-PT Submission for NIST LRE 2017

Author: Burget Lukáš
Cumani Sandro
Diez Mireia
Glembek Ondřej
Grézl František
Kamsali Mounika
Kesiraju Santosh
Lozano-Diez Alicia
Matějka Pavel
Novotný Ondřej
Ondel Lucas
Plchot Oldřich
Rohdin Johan
Silnova Anna
Slavíček Josef
Publication venue: 'International Speech Communication Association'
Publication date: 01/01/2018
Field of study

Crossref

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

Patrol team language identification system for DARPA RATS P1 evaluation

Author: D'haro Enríquez Luis Fernando
Dehak Najim
Glembek Ondřej
Grézl František
Ma Jeff
Matsoukas Spyros
Matějka Pavel
Plchot Oldřich
Souﬁfar Mehdi
Veselý Karel
Publication venue: E.T.S.I. Telecomunicación (UPM)
Publication date: 01/01/2012
Field of study

This paper describes the language identification (LID) system developed by the Patrol team for the first phase of the DARPA RATS (Robust Automatic Transcription of Speech) program, which seeks to advance state of the art detection capabilities on audio from highly degraded communication channels. We show that techniques originally developed for LID on telephone speech (e.g., for the NIST language recognition evaluations) remain effective on the noisy RATS data, provided that careful consideration is applied when designing the training and development sets. In addition, we show significant improvements from the use of Wiener filtering, neural network based and language dependent i-vector modeling, and fusion

Archivo Digital UPM

The subspace Gaussian mixture model—A structured model for speech recognition

Author: Agarwal Mohit
Akyazi Pinar
Burget Lukáš
Ghoshal Arnab
Glembek Ondřej
Kai Feng
Povey Daniel
Publication venue: 'Elsevier BV'
Publication date: 19/11/2014
Field of study

We describe a new approach to speech recognition, in which all Hidden Markov Model (HMM) states share the same Gaussian Mixture Model (GMM) structure with the same number of Gaussians in each state. The model is defined by vectors associated with each state with a dimension of, say, 50, together with a global mapping from this vector space to the space of parameters of the GMM. This model appears to give better results than a conventional model, and the extra structure offers many new opportunities for modeling innovations while maintaining compatibility with most standard techniques

Infoscience - École polytechnique fédérale de Lausanne

Optimization of Gaussian Mixture Subspace Models and Related Scoring Algorithms in Speaker Verification

Author: Glembek Ondřej
Publication venue: Vysoké učení technické v Brně. Fakulta informačních technologií
Publication date: 01/01/2012
Field of study

Tato práce pojednává o modelování v podprostoru parametrů směsí gaussovských rozložení pro rozpoznávání mluvčího. Práce se skládá ze tří částí. První část je věnována skórovacím metodám při použití sdružené faktorové analýzy k modelování mluvčího. Studované metody se liší převážně v tom, jak se vypořádávají s variabilitou kanálu testovacích nahrávek. Metody jsou prezentovány v souvislosti s obecnou formou funkce pravděpodobnosti pro sdruženou faktorovou analýzu a porovnány jak z hlediska přesnosti, tak i z hlediska rychlosti. Je zde prokázáno, že použití lineární aproximace pravděpodobnostní funkce dává výsledky srovnatelné se standardním vyhodnocením pravděpodobnosti při dramatickém zjednodušení matematického zápisu a tím i zvýšení rychlosti vyhodnocování. Druhá část pojednává o extrakci tzv. i-vektorů, tedy nízkodimenzionálních reprezentací nahrávek. Práce prezentuje dva přístupy ke zjednodušení extrakce. Motivací pro tuto část bylo jednak urychlení extrakce i-vektorů, jednak nasazení této úspěšné techniky na jednoduchá zařízení typu mobilní telefon, a také matematické zjednodušení umožněňující využití numerických optimalizačních metod pro diskriminativní trénování. Výsledky ukazují, že na dlouhých nahrávkách je zrychlení vykoupeno poklesem úspěšnosti rozpoznávání, avšak na krátkých nahrávkách, kde je úspěšnost rozpoznávání nízká, se rozdíly úspěšnosti stírají. Třetí část se zabývá diskriminativním trénováním v oblasti rozpoznávání mluvčího. Jsou zde shrnuty poznatky z předchozích prací zabývajících se touto problematikou. Kapitola navazuje na poznatky z předchozích dvou částí a pojednává o diskriminativním trénování parametrů extraktoru i-vektorů. Výsledky ukazují, že při klasickém trénování extraktoru a následném diskriminatviním přetrénování tyto metody zvyšují úspěšnost

National Repository of Grey Literature

iVector-based discriminative adaptation for automatic speech recognition

Author: Jan &quot
Lukáš Burget
Martin Karafiát
Matějka
Ondřej Glembek #4
Pavel #2
Černocký #5
Publication venue
Publication date: 01/01/2011
Field of study

Abstract-We presented a novel technique for discriminative feature-level adaptation of automatic speech recognition system. The concept of iVectors popular in Speaker Recognition is used to extract information about speaker or acoustic environment from speech segment. iVector is a low-dimensional fixedlength representing such information. To utilized iVectors for adaptation, Region Dependent Linear Transforms (RDLT) are discriminatively trained using MPE criterion on large amount of annotated data to extract the relevant information from iVectors and to compensate speech feature. The approach was tested on standard CTS data. We found it to be complementary to common adaptation techniques. On a well tuned RDLT system with standard CMLLR adaptation we reached 0.8% additive absolute WER improvement

CiteSeerX