Search CORE

23 research outputs found

Estimating confidence using word lattices

Author: Kemp Thomas
Schaaf Thomas
Publication venue
Publication date: 02/08/2007
Field of study

KITopen

Recommended from our members

Confidence Estimation for Black Box Automatic Speech Recognition Systems Using Lattice Recurrent Neural Networks

Author: Gales Mark
Kastanos A
Ragni A
Publication venue: 'Organisation for Economic Co-Operation and Development (OECD)'
Publication date: 01/05/2020
Field of study

Apollo (Cambridge)

System-independent ASR error detection and classification using Recurrent Neural Network

Author: Amaral
Asmaa EL Hannani
Bell
Deena
Errattahi
Errattahi
Fayolle
Fong
Gibson
Hassan Ouahmane
Jiang
Kemp
Korenevsky
Levin
Mangu
Nair
Nguyen
Ogawa
Pellegrini
Rahhal Errattahi
Rahim
Rudnicky
Rueber
Saz
Seigel
Sukkar
Thomas Hain
Wessel
Wessel
Zhang
Publication venue: 'Elsevier BV'
Publication date: 01/05/2019
Field of study

This paper addresses errors in continuous Automatic Speech Recognition (ASR) in two stages: error detection and error type classification. Unlike the majority of research in this field, we propose to handle the recognition errors independently from the ASR decoder. We first establish an effective set of generic features derived exclusively from the recognizer output to compensate for the absence of ASR decoder information. Then, we apply a variant Recurrent Neural Network (V-RNN) based models for error detection and error type classification. Such model learn additional information to the recognized word classification using label dependency. As a result, experiments on Multi-Genre Broadcast Media corpus have shown that the proposed generic features setup leads to achieve competitive performances, compared to state of the art systems in both tasks. Furthermore, we have shown that V-RNN trained on the proposed feature set appear to be an effective classifier for the ASR error detection with an Accuracy of 85.43%

Crossref

White Rose Research Online

Détection et correction d'erreurs utilisant les probabilités a posteriori dans un système de reconnaissance de phrases manuscrites en-ligne

Author: Anquetil Eric
Quiniou Solen
Publication venue: Cépaduès
Publication date: 01/01/2009
Field of study

National audienceDans cet article, nous présentons un système complet de reconnaissance de phrases manuscrites en-ligne. Nous nous intéressons plus particulièrement à la détection d'erreurs potentielles sur les phrases issues d'une reconnaissance avec une approche au Maximum A Posteriori. Les probabilités a posteriori des mots, obtenues à partir d'une représentation sous la forme d'un réseau de confusion, sont ainsi utilisées comme indices de confiance. Des classifieurs dédiés (ici, des SVM) sont ensuite appris afin de corriger ces erreurs, en combinant ces probabilités a posteriori à d'autres sources de connaissance. Un mécanisme de rejet est également introduit afin de distinguer les hypothèses d'erreur qui ne pourront être corrigées par l'approche proposée. Des expérimentations ont été menées sur une base de 425 phrases manuscrites écrites par 17 scripteurs. Elles ont mis en évidence une réduction relative du taux d'erreur sur les mots de 14,6

HAL Descartes

Hal-Diderot

HAL-Rennes 1

Semi-Supervised Acoustic Model Training by Discriminative Data Selection from Multiple ASR Systems' Hypotheses

Author: Akita Yuya
Kawahara Tatsuya
Li Sheng
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 03/05/2016
Field of study

While the performance of ASR systems depends on the size of the training data, it is very costly to prepare accurate and faithful transcripts. In this paper, we investigate a semisupervised training scheme, which takes the advantage of huge quantities of unlabeled video lecture archive, particularly for the deep neural network (DNN) acoustic model. In the proposed method, we obtain ASR hypotheses by complementary GMM-and DNN-based ASR systems. Then, a set of CRF-based classifiers is trained to select the correct hypotheses and verify the selected data. The proposed hypothesis combination shows higher quality compared with the conventional system combination method (ROVER). Moreover, compared with the conventional data selection based on confidence measure score, our method is demonstrated more effective for filtering usable data. Significant improvement in the ASR accuracy is achieved over the baseline system and in comparison with the models trained with the conventional system combination and data selection methods

Kyoto University Research Information Repository

Deep neural network features and semi-supervised training for low resource speech recognition

Author: Hynek Hermansky
Kenneth Church
Michael L. Seltzer
Samuel Thomas
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2013
Field of study

We propose a new technique for training deep neural networks (DNNs) as data-driven feature front-ends for large vocabulary con-tinuous speech recognition (LVCSR) in low resource settings. To circumvent the lack of sufficient training data for acoustic mod-eling in these scenarios, we use transcribed multilingual data and semi-supervised training to build the proposed feature front-ends. In our experiments, the proposed features provide an absolute im-provement of 16 % in a low-resource LVCSR setting with only one hour of in-domain training data. While close to three-fourths of these gains come from DNN-based features, the remaining are from semi-supervised training. Index Terms — Low resource, speech recognition, deep neural networks, semi-supervised training, bottleneck features

CiteSeerX

Crossref

Confidence Score Based Speaker Adaptation of Conformer Speech Recognition Systems

Author: Cui Mingyu
Deng Jiajun
Hu Shujie
Jin Zengrui
Li Guinan
Liu Xunying
Wang Tianzi
Xie Xurong
Xue Boyang
Publication venue
Publication date: 15/02/2023
Field of study

Speaker adaptation techniques provide a powerful solution to customise automatic speech recognition (ASR) systems for individual users. Practical application of unsupervised model-based speaker adaptation techniques to data intensive end-to-end ASR systems is hindered by the scarcity of speaker-level data and performance sensitivity to transcription errors. To address these issues, a set of compact and data efficient speaker-dependent (SD) parameter representations are used to facilitate both speaker adaptive training and test-time unsupervised speaker adaptation of state-of-the-art Conformer ASR systems. The sensitivity to supervision quality is reduced using a confidence score-based selection of the less erroneous subset of speaker-level adaptation data. Two lightweight confidence score estimation modules are proposed to produce more reliable confidence scores. The data sparsity issue, which is exacerbated by data selection, is addressed by modelling the SD parameter uncertainty using Bayesian learning. Experiments on the benchmark 300-hour Switchboard and the 233-hour AMI datasets suggest that the proposed confidence score-based adaptation schemes consistently outperformed the baseline speaker-independent (SI) Conformer model and conventional non-Bayesian, point estimate-based adaptation using no speaker data selection. Similar consistent performance improvements were retained after external Transformer and LSTM language model rescoring. In particular, on the 300-hour Switchboard corpus, statistically significant WER reductions of 1.0%, 1.3%, and 1.4% absolute (9.5%, 10.9%, and 11.3% relative) were obtained over the baseline SI Conformer on the NIST Hub5'00, RT02, and RT03 evaluation sets respectively. Similar WER reductions of 2.7% and 3.3% absolute (8.9% and 10.2% relative) were also obtained on the AMI development and evaluation sets.Comment: IEEE/ACM Transactions on Audio, Speech, and Language Processin

arXiv.org e-Print Archive