Search CORE

8 research outputs found

Enhanced Phone Posteriors for Improving Speech Recognition Systems

Author: Bourlard Hervé
Ketabdar Hamed
Publication venue: IDIAP
Publication date: 11/02/2010
Field of study

Using phone posterior probabilities has been increasingly explored for improving automatic speech recognition (ASR) systems. In this paper, we propose two approaches for hierarchically enhancing these phone posteriors, by integrating long acoustic context, as well as prior phonetic and lexical knowledge. In the first approach, phone posteriors estimated with a Multi-Layer Perceptron (MLP), are used as emission probabilities in HMM forward-backward recursions. This yields new enhanced posterior estimates integrating HMM topological constraints (encoding specific phonetic and lexical knowledge), and context. posteriors are post-processed by a secondary MLP, in order to learn inter and intra dependencies between the phone posteriors. These dependencies are prior phonetic knowledge. The learned knowledge is integrated in the posterior estimation during the inference (forward pass) of the second MLP, resulting in enhanced phone posteriors. We investigate the use of the enhanced posteriors in hybrid HMM/ANN and Tandem configurations. We propose using the enhanced posteriors as replacement, or as complementary evidences to the regular MLP posteriors. The proposed method has been tested on different small and large vocabulary databases, always resulting in consistent improvements in frame, phone and word recognition rates

Infoscience - École polytechnique fédérale de Lausanne

Kulcsszókeresési kísérletek hangzó híranyagokon beszédhang alapú felismerési technikákkal

Author: Gosztolya Gábor
Tóth László
Publication venue: Szegedi Tudományegyetem
Publication date: 01/01/2010
Field of study

A beszédadatbázisok kereshetővé tételéhez szöveges címkékkel kell azokat ellátni. A kézenfekvő megoldás szószintű átirat készíttetése lenne nagyszótáras beszédfelismerővel. A felismerők azonban zárt szótárral dolgoznak, így előfordulhat, hogy számunkra fontos keresési kifejezéseket (tulajdonneveket, névelemeket) esélyünk sem lesz megtalálni, pusztán mert azok nem szerepelnek a felismerő szótárában. Jelen cikkben olyan megoldásokat hasonlítunk össze, amelyek csupán beszédhang szinten végzik el az előzetes indexálást, így tetszőleges keresési kifejezésre (hangsorozatra) képesek rákeresni. A vizsgált módszerek találati pontossága gyakorlati szempontból is használhatónak ígérkezik, köszönhetően az eleve magas beszédhang-felismerési pontosságnak. A futási időt tekintve azonban még a leggyorsabb módszer is sokkal lassabbnak bizonyul, mint ami egy ilyen alkalmazástól elvárt lenne. Ezért a kés őbbiekben kifinomult indexálási technikák bevetésére lesz szükség

SZTE Publicatio Repozitórium - SZTE - Repository of Publications

Spoken term detection with Connectionist Temporal Classification: A novel hybrid CTC-DBN decoder

Author
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

Crossref

Kulcsszókeresési kísérletek hangzó híranyagokon beszédhang alapú felismerési technikákkal

Author: Gosztolya Gábor
Tóth László
Publication venue
Publication date: 01/01/2010
Field of study

A beszédadatbázisok kereshetvé tételéhez szöveges címkékkel kell azokat ellátni. A kézenfekv megoldás szószint átirat készíttetése lenne nagyszótáras beszédfelismervel. A felismerk azonban zárt szótárral dolgoznak, így elfordulhat, hogy számunkra fontos keresési kifejezéseket (tulajdonneveket, névelemeket) esélyünk sem lesz megtalálni, pusztán mert azok nem szerepelnek a felismer szótárában. Jelen cikkben olyan megoldásokat hasonlítunk össze, amelyek csupán beszédhang szinten végzik el az elzetes indexálást, így tetszleges keresési kifejezésre (hangsorozatra) képesek rákeresni. A vizsgált módszerek találati pontossága gyakorlati szempontból is használhatónak ígérkezik, köszönheten az eleve magas beszédhang-felismerési pontosságnak. A futási idt tekintve azonban még a leggyorsabb módszer is sokkal lassabbnak bizonyul, mint ami egy ilyen alkalmazástól elvárt lenne. Ezért a késbbiekben kifinomult indexálási technikák bevetésére lesz szükség

SZTE Publicatio Repozitórium - SZTE - Repository of Publications

University of Szeged

Posterior-Based Multi-Stream Formulation To Combine Multiple Grapheme-to-Phoneme Conversion Techniques

Author: Magimai.-Doss Mathew
Razavi Marzieh
Publication venue: Idiap
Publication date: 19/11/2015
Field of study

Infoscience - École polytechnique fédérale de Lausanne

Enhancing posterior based speech recognition systems

Author: Ketabdar Hamed
Publication venue: Lausanne, EPFL
Publication date: 05/09/2008
Field of study

The use of local phoneme posterior probabilities has been increasingly explored for improving speech recognition systems. Hybrid hidden Markov model / artificial neural network (HMM/ANN) and Tandem are the most successful examples of such systems. In this thesis, we present a principled framework for enhancing the estimation of local posteriors, by integrating phonetic and lexical knowledge, as well as long contextual information. This framework allows for hierarchical estimation, integration and use of local posteriors from the phoneme up to the word level. We propose two approaches for enhancing the posteriors. In the first approach, phoneme posteriors estimated with an ANN (particularly multi-layer Perceptron – MLP) are used as emission probabilities in HMM forward-backward recursions. This yields new enhanced posterior estimates integrating HMM topological constraints (encoding specific phonetic and lexical knowledge), and long context. In the second approach, a temporal context of the regular MLP posteriors is post-processed by a secondary MLP, in order to learn inter and intra dependencies among the phoneme posteriors. The learned knowledge is integrated in the posterior estimation during the inference (forward pass) of the second MLP, resulting in enhanced posteriors. The use of resulting local enhanced posteriors is investigated in a wide range of posterior based speech recognition systems (e.g. Tandem and hybrid HMM/ANN), as a replacement or in combination with the regular MLP posteriors. The enhanced posteriors consistently outperform the regular posteriors in different applications over small and large vocabulary databases

Infoscience - École polytechnique fédérale de Lausanne

VII. Magyar Számítógépes Nyelvészeti Konferencia

Author
Publication venue
Publication date: 01/01/2010
Field of study

University of Szeged