Search CORE

1,112 research outputs found

Homogenous Ensemble Phonotactic Language Recognition Based on SVM Supervector Reconstruction

Author: Johnson Michael T
Liu Jia
Liu Wei-Wei
Zhang Wei-Qiang
Publication venue: e-Publications@Marquette
Publication date: 01/01/2014
Field of study

Currently, acoustic spoken language recognition (SLR) and phonotactic SLR systems are widely used language recognition systems. To achieve better performance, researchers combine multiple subsystems with the results often much better than a single SLR system. Phonotactic SLR subsystems may vary in the acoustic features vectors or include multiple language-specific phone recognizers and different acoustic models. These methods achieve good performance but usually compute at high computational cost. In this paper, a new diversification for phonotactic language recognition systems is proposed using vector space models by support vector machine (SVM) supervector reconstruction (SSR). In this architecture, the subsystems share the same feature extraction, decoding, and N-gram counting preprocessing steps, but model in a different vector space by using the SSR algorithm without significant additional computation. We term this a homogeneous ensemble phonotactic language recognition (HEPLR) system. The system integrates three different SVM supervector reconstruction algorithms, including relative SVM supervector reconstruction, functional SVM supervector reconstruction, and perturbing SVM supervector reconstruction. All of the algorithms are incorporated using a linear discriminant analysis-maximum mutual information (LDA-MMI) backend for improving language recognition evaluation (LRE) accuracy. Evaluated on the National Institute of Standards and Technology (NIST) LRE 2009 task, the proposed HEPLR system achieves better performance than a baseline phone recognition-vector space modeling (PR-VSM) system with minimal extra computational cost. The performance of the HEPLR system yields 1.39%, 3.63%, and 14.79% equal error rate (EER), representing 6.06%, 10.15%, and 10.53% relative improvements over the baseline system, respectively, for the 30-, 10-, and 3-s test conditions

epublications@Marquette

Springer - Publisher Connector

Combining joint factor analysis and iVectors for robust language recognition

Author: Demuynck Kris
Desplanques Brecht
Martens Jean-Pierre
Publication venue
Publication date: 01/01/2014
Field of study

Ghent University Academic Bibliography

Unsupervised crosslingual adaptation of tokenisers for spoken language recognition

Author: Raymond W.M. Ng
Mauro Nicolao
Thomas Hain
Ambikairajah
Anderson
BenZeghiba
BenZeghiba
Caraballo
Corboda
Davis
Dehak
D’Haro
D’Haro
Fék
Ferrer
Gauvain
Gibson
Glembek
Hazen
Hermansky
Joachims
Knill
Li
Li
Lööf
Ma
Muthusamy
Navrátil
Ng
Ng
Richardson
Schultz
Schwarz
Singer
Suzuki
Torres-Carrasquillo
Torres-Carrasquillo
Veselý
Vu
Xue
Zissman
Zissman
Publication venue: 'Elsevier BV'
Publication date: 01/11/2017
Field of study

Phone tokenisers are used in spoken language recognition (SLR) to obtain elementary phonetic information. We present a study on the use of deep neural network tokenisers. Unsupervised crosslingual adaptation was performed to adapt the baseline tokeniser trained on English conversational telephone speech data to different languages. Two training and adaptation approaches, namely cross-entropy adaptation and state-level minimum Bayes risk adaptation, were tested in a bottleneck i-vector and a phonotactic SLR system. The SLR systems using the tokenisers adapted to different languages were combined using score fusion, giving 7-18% reduction in minimum detection cost function (minDCF) compared with the baseline configurations without adapted tokenisers. Analysis of results showed that the ensemble tokenisers gave diverse representation of phonemes, thus bringing complementary effects when SLR systems with different tokenisers were combined. SLR performance was also shown to be related to the quality of the adapted tokenisers

Crossref

Biblioteca Digital de la Comunidad de Madrid

White Rose Research Online

PHONOTACTIC AND ACOUSTIC LANGUAGE RECOGNITION

Author: Matějka Pavel
Publication venue: Vysoké učení technické v Brně. Fakulta elektrotechniky a komunikačních technologií
Publication date: 01/01/2009
Field of study

Práce pojednává o fonotaktickém a akustickém přístupu pro automatické rozpoznávání jazyka. První část práce pojednává o fonotaktickém přístupu založeném na výskytu fonémových sekvenci v řeči. Nejdříve je prezentován popis vývoje fonémového rozpoznávače jako techniky pro přepis řeči do sekvence smysluplných symbolů. Hlavní důraz je kladen na dobré natrénování fonémového rozpoznávače a kombinaci výsledků z několika fonémových rozpoznávačů trénovaných na různých jazycích (Paralelní fonémové rozpoznávání následované jazykovými modely (PPRLM)). Práce také pojednává o nové technice anti-modely v PPRLM a studuje použití fonémových grafů místo nejlepšího přepisu. Na závěr práce jsou porovnány dva přístupy modelování výstupu fonémového rozpoznávače -- standardní n-gramové jazykové modely a binární rozhodovací stromy. Hlavní přínos v akustickém přístupu je diskriminativní modelování cílových modelů jazyků a první experimenty s kombinací diskriminativního trénování a na příznacích, kde byl odstraněn vliv kanálu. Práce dále zkoumá různé druhy technik fúzi akustického a fonotaktického přístupu. Všechny experimenty jsou provedeny na standardních datech z NIST evaluaci konané v letech 2003, 2005 a 2007, takže jsou přímo porovnatelné s výsledky ostatních skupin zabývajících se automatickým rozpoznáváním jazyka. S fúzí uvedených technik jsme posunuli state-of-the-art výsledky a dosáhli vynikajících výsledků ve dvou NIST evaluacích.This thesis deals with phonotactic and acoustic techniques for automatic language recognition (LRE). The first part of the thesis deals with the phonotactic language recognition based on co-occurrences of phone sequences in speech. A thorough study of phone recognition as tokenization technique for LRE is done, with focus on the amounts of training data for phone recognizer and on the combination of phone recognizers trained on several language (Parallel Phone Recognition followed by Language Model - PPRLM). The thesis also deals with novel technique of anti-models in PPRLM and investigates into using phone lattices instead of strings. The work on phonotactic approach is concluded by a comparison of classical n-gram modeling techniques and binary decision trees. The acoustic LRE was addressed too, with the main focus on discriminative techniques for training target language acoustic models and on initial (but successful) experiments with removing channel dependencies. We have also investigated into the fusion of phonotactic and acoustic approaches. All experiments were performed on standard data from NIST 2003, 2005 and 2007 evaluations so that the results are directly comparable to other laboratories in the LRE community. With the above mentioned techniques, the fused systems defined the state-of-the-art in the LRE field and reached excellent results in NIST evaluations.

Digital library of Brno University of Technology

National Repository of Grey Literature

Unsupervised crosslingual adaptation of tokenisers for spoken language recognition

Author: Ambikairajah
Anderson
BenZeghiba
BenZeghiba
Caraballo
Corboda
Davis
Dehak
D’Haro
D’Haro
Ferrer
Fék
Gauvain
Gibson
Glembek
Hazen
Hermansky
Joachims
Knill
Li
Li
Lööf
Ma
Mauro Nicolao
Muthusamy
Navrátil
Ng
Ng
Raymond W.M. Ng
Richardson
Schultz
Schwarz
Singer
Suzuki
Thomas Hain
Torres-Carrasquillo
Torres-Carrasquillo
Veselý
Vu
Xue
Zissman
Zissman
Publication venue: 'Elsevier BV'
Publication date: 01/11/2017
Field of study

Crossref

White Rose Research Online

NIST 2007 Language Recognition Evaluation: From the Perspective of IIR

Author: Lee Kong-Aik
Li Haizhou
Ma Bin
Sim Khe-Chai
Sun Hanwu
Tong Rong
You Changhuai
Zhu Donglai
Publication venue: De La Salle University - Dasmarinas
Publication date: 01/01/2008
Field of study

PACLIC / The University of the Philippines Visayas Cebu College Cebu City, Philippines / November 20-22, 200

Waseda University Repository

Recommended from our members

Using Prosody and Phonotactics in Arabic Dialect Identiﬁcation

Author: Biadsy Fadi
Hirschberg Julia Bell
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2009
Field of study

While Modern Standard Arabic is the formal spoken and written language of the Arab world, dialects are the major communication mode for everyday life; identifying a speaker’s dialect is thus critical to speech processing tasks such as automatic speech recognition, as well as speaker identification We examine the role of prosodic features (intonation and rhythm) across four Arabic dialects: Gulf, Iraqi, Levantine, and Egyptian, for the purpose of automatic dialect identification We show that prosodic features can significantly improve identification, over a purely phonotactic-based approach, with an identification accuracy of 86.33% for 2m utterances

Columbia University Academic Commons

Bayesian Models for Unit Discovery on a Very Low Resource Language

Author: Besacier Laurent
Burget Lukas
Dupoux Emmanuel
Godard Pierre
Hasegawa-Johnson Mark
Khudanpur Sanjeev
Larsen Elin
Ondel Lucas
Scharenborg Odette
Yvon François
Publication venue
Publication date: 20/02/2018
Field of study

Developing speech technologies for low-resource languages has become a very active research field over the last decade. Among others, Bayesian models have shown some promising results on artificial examples but still lack of in situ experiments. Our work applies state-of-the-art Bayesian models to unsupervised Acoustic Unit Discovery (AUD) in a real low-resource language scenario. We also show that Bayesian models can naturally integrate information from other resourceful languages by means of informative prior leading to more consistent discovered units. Finally, discovered acoustic units are used, either as the 1-best sequence or as a lattice, to perform word segmentation. Word segmentation results show that this Bayesian approach clearly outperforms a Segmental-DTW baseline on the same corpus.Comment: Accepted to ICASSP 201

arXiv.org e-Print Archive

Crossref

Hal - Université Grenoble Alpes

INRIA a CCSD electronic archive server

On the use of high-level information in speaker and language recognition

Author: González Domínguez Javier
González-Rodríguez Joaquín
López Moreno Ignacio
Montero-Asenjo Alberto
Ramos Daniel
Toledano Doroteo T.
Publication venue
Publication date: 01/01/2006
Field of study

Actas de las IV Jornadas de Tecnología del Habla (JTH 2006)Automatic Speaker Recognition systems have been largely dominated by acoustic-spectral based systems, relying in proper modelling of the short-term vocal tract of speakers. However, there is scientific and intuitive evidence that speaker specific information is embedded in the speech signal in multiple short- and long-term characteristics. In this work, a multilevel speaker recognition system combining acoustic, phonotactic and prosodic subsystems is presented and assessed using NIST 2005 Speaker Recognition Evaluation data. For language recognition systems, the NIST 2005 Language Recognition Evaluation was selected to measure performance of a high-level language recognition systems

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Biblos-e Archivo