Search CORE

7 research outputs found

Recommended from our members

Confidence Estimation for Black Box Automatic Speech Recognition Systems Using Lattice Recurrent Neural Networks

Author: Gales Mark
Kastanos A
Ragni A
Publication venue: 'Organisation for Economic Co-Operation and Development (OECD)'
Publication date: 01/05/2020
Field of study

Apollo (Cambridge)

Speaker-adapted confidence measures for speech recognition of video lectures

Author: Andrés Ferrer Jesús
Juan Císcar Alfonso
Sanchez-Cortina Isaias
Sanchis Navarro José Alberto
Publication venue: 'Elsevier BV'
Publication date: 01/05/2016
Field of study

[EN] Automatic speech recognition applications can benefit from a confidence measure (CM) to predict the reliability of the output. Previous works showed that a word-dependent native Bayes (NB) classifier outperforms the conventional word posterior probability as a CM. However, a discriminative formulation usually renders improved performance due to the available training techniques. Taking this into account, we propose a logistic regression (LR) classifier defined with simple input functions to approximate to the NB behaviour. Additionally, as a main contribution, we propose to adapt the CM to the speaker in cases in which it is possible to identify the speakers, such as online lecture repositories. The experiments have shown that speaker-adapted models outperform their non-adapted counterparts on two difficult tasks from English (videoLectures.net) and Spanish (poliMedia) educational lectures. They have also shown that the NB model is clearly superseded by the proposed LR classifier.The research leading to these results has received funding from the European Union Seventh Framework Programme (FP7/2007-2013) under grant agreement no 287755. Also supported by the Spanish MINECO (iTrans2 TIN2009-14511 and Active2Trans TIN2012-31723) research projects and the FPI Scholarship BES-2010-033005.Sanchez-Cortina, I.; Andrés Ferrer, J.; Sanchis Navarro, JA.; Juan Císcar, A. (2016). Speaker-adapted confidence measures for speech recognition of video lectures. Computer Speech and Language. 37:11-23. https://doi.org/10.1016/j.csl.2015.10.003S11233

RiuNet

Semi-Supervised Acoustic Model Training by Discriminative Data Selection from Multiple ASR Systems' Hypotheses

Author: Akita Yuya
Kawahara Tatsuya
Li Sheng
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 03/05/2016
Field of study

While the performance of ASR systems depends on the size of the training data, it is very costly to prepare accurate and faithful transcripts. In this paper, we investigate a semisupervised training scheme, which takes the advantage of huge quantities of unlabeled video lecture archive, particularly for the deep neural network (DNN) acoustic model. In the proposed method, we obtain ASR hypotheses by complementary GMM-and DNN-based ASR systems. Then, a set of CRF-based classifiers is trained to select the correct hypotheses and verify the selected data. The proposed hypothesis combination shows higher quality compared with the conventional system combination method (ROVER). Moreover, compared with the conventional data selection based on confidence measure score, our method is demonstrated more effective for filtering usable data. Significant improvement in the ASR accuracy is achieved over the baseline system and in comparison with the models trained with the conventional system combination and data selection methods

Kyoto University Research Information Repository

Combining Information Sources for Confidence Estimation with CRF Models

Author: Assoc ISC
Seigel MS
Woodland PC
Publication venue
Publication date: 01/01/2011
Field of study

CUED - Cambridge University Engineering Department

Combining information sources for confidence estimation with CRF models

Author: Seigel MS
Woodland PC
Publication venue
Publication date: 01/12/2011
Field of study

Obtaining accurate confidence measures for automatic speech recognition (ASR) transcriptions is an important task which stands to benefit from the use of multiple information sources. This paper investigates the application of conditional random field (CRF) models as a principled technique for combining multiple features from such sources. A novel method for combining suitably defined features is presented, allowing for confidence annotation using lattice-based features of hypotheses other than the lattice 1-best. The resulting framework is applied to different stages of a state-of-the-art large vocabulary speech recognition pipeline, and consistent improvements are shown over a sophisticated baseline system. Copyright © 2011 ISCA

CUED - Cambridge University Engineering Department