Search CORE

2 research outputs found

Combination of Multiple Acoustic Models with Multi-scale Features for Myanmar Speech Recognition

Author: Maung Su Su
Oo Nyein Nyein
Soe Thandar
Publication venue: 'International Journal of Computer Engineering and Applications'
Publication date: 13/02/2018
Field of study

We proposed an approach to build a robust automatic speech recognizer using deep convolutional neural networks (CNNs). Deep CNNs have achieved a great success in acoustic modelling for automatic speech recognition due to its ability of reducing spectral variations and modelling spectral correlations in the input features. In most of the acoustic modelling using CNN, a fixed windowed feature patch corresponding to a target label (e.g., senone or phone) was used as input to the CNN. Considering different target labels may correspond to different time scales, multiple acoustic models were trained with different acoustic feature scales. Due to auxiliary information learned from different temporal scales could help in classification, multi-CNN acoustic models were combined based on a Recognizer Output Voting Error Reduction (ROVER) algorithm for final speech recognition experiments. The experiments were conducted on a Myanmar large vocabulary continuous speech recognition (LVCSR) task. Our results showed that integration of temporal multi-scale features in model training achieved a 4.32% relative word error rate (WER) reduction over the best individual system on one temporal scale feature

International Journal of Computer (IJC - Global Society of Scientific Research and Researchers, GSSRR)

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

MERAL Portal

Generating complementary systems for speech recognition

Author: Breslin C
Gales MJF
Publication venue: 'International Speech Communication Association'
Publication date: 21/09/2006
Field of study

CUED - Cambridge University Engineering Department